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Abstract 



We view Digital Ecosystems to be the digital counterparts of biological ecosystems, which 
are considered to be robust, self-organising and scalable architectures that can automatically 
solve complex, dynamic problems. So, this work is concerned with the creation, investigation, 
and optimisation of Digital Ecosystems, exploiting the self-organising properties of biological 
ecosystems. First, we created the Digital Ecosystem, a novel optimisation technique inspired 
by biological ecosystems, where the optimisation works at two levels: a first optimisation, 
migration of agents which are distributed in a decentralised peer-to-peer network, operating 
continuously in time; this process feeds a second optimisation based on evolutionary computing 
that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant 
constraints. We then investigated its self-organising aspects, starting with an extension 
to the definition of Physical Complexity to include the evolving agent populations of our 
Digital Ecosystem. Next, we established stability of evolving agent populations over time, 
by extending the Chli-De Wilde definition of agent stability to include evolutionary dynamics. 
Further, we evaluated the diversity of the software agents within evolving agent populations, 
relative to the environment provided by the user base. To conclude, we considered alternative 
augmentations to optimise and accelerate our Digital Ecosystem, by studying the accelerating 
effect of a clustering catalyst on the evolutionary dynamics of our Digital Ecosystem, through 
the direct acceleration of the evolutionary processes. We also studied the optimising effect of 
targeted migration on the ecological dynamics of our Digital Ecosystem, through the indirect 
and emergent optimisation of the agent migration patterns. Overall, we have advanced the 
understanding of creating Digital Ecosystems, the self-organisation that occurs within them, 
and the optimisation of their Ecosystem- Oriented Architecture. 
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Chapter 1 



Introduction 



1.1 Motivation and Objectives 

Is mimicking ecosystems the future of information systems ? 

A key challenge in modern computing is to develop systems that address complex, dynamic 
problems in a scalable and efficient way, because the increasing complexity of software makes 
designing and maintaining efficient and flexible systems a growing challenge [209, 299, 193]. 
What with the ever expanding number of services being offered online from Application 
Programming Interfaces (APIs) being made public, there is an ever growing number of 
computational units available to be combined in the creation of applications. However, this 
is currently a task done manually by programmers, and it has been argued [184] that current 
software development techniques have hit a complexity wall, which can only be overcome by 
automating the search for new algorithms. There are several existing efforts aimed at achieving 
this automated service composition [203, 226, 207, 255], the most prevalent of which is Service- 
Oriented Architectures and its associated standards and technologies [66, 320]. 

Alternatively, nature has been in the research business for 3.8 billion years and in that time has 
accumulated close to 30 million well-adjusted solutions to a plethora of design challenges that 
humankind struggles to address with mixed results [33]. Biomimicry is a discipline that seeks 
solutions by emulating nature's designs and processes, and there is considerable opportunity 
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Chapter 1. Introduction 



to learn elegant solutions for human-made problems [33]. Biological ecosystems are thought 
to be robust, scalable architectures that can automatically solve complex, dynamic problems, 
possessing several properties that may be useful in automated systems. These properties include 
self-organisation, self-management, scalability, the ability to provide complex solutions, and the 
automated composition of these complex solutions [173]. 

Therefore, an approach to the aforementioned challenge would be to develop Digital Ecosystems, 
artificial systems that aim to harness the dynamics that underlie the complex and diverse 
adaptations of living organisms in biological ecosystems. While evolution may be well 
understood in computer science under the auspices of evolutionary computing [90], ecological 
models are not. The possible connections between Digital Ecosystems and their biological 
counterparts are yet to be closely examined, so potential exists to create an Ecosystem-Oriented 
Architecture with the essential elements of biological ecosystems, where the word ecosystem is 
more than just a metaphor. We propose that an ecosystem inspired approach, would be more 
effective at greater scales than traditionally inspired approaches, because it would be built upon 
the scalable and self-organising properties of biological ecosystems [173]. However, ecological 
succession, the formation of a mature ecosystem from the predictable and orderly changes in 
the composition and structure of an ecological community [29], is a slow process. So, for our 
Digital Ecosystems it will be desirable to accelerate and optimise the equivalent process, which 
may be possible through the application of augmentations that interact with the ecosystem 
dynamics. Therefore, the primary objectives are as follows: 

• Determine the structure of an Ecosystem-Oriented Architecture and so create Digital 
Ecosystems, which are the digital counterpart of biological ecosystems, and so have 
analogous properties of self-organisation, scalability and sustainability. 

• Develop an understanding of the self-organising behaviour within a Digital Ecosystem, 
learning where and how it occurs, what forms it can take, and how it can be quantified. 

• Investigate if we can accelerate or optimise the evolutionary and ecological self-organising 
dynamics of Digital Ecosystems, exploring how alternative augmentations interact with 
the ecosystem dynamics. 
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1.2 Contributions 



Substantial parts of our efforts are original contributions in the area of Biologically-Inspired 
Computing [99] and the emerging field of Digital Ecosystems, with our major research 
contributions being as follows: 

• We have determined the fundamentals for a new class of system, Digital Ecosystems, 
created through combining understanding from theoretical ecology, evolutionary the- 
ory, Multi-Agent Systems, distributed evolutionary computing, and Service-Oriented 
Architectures. 

• We have investigated where and how self-organisation occurs in Digital Ecosystems, what 
forms it can take and how it can be quantified, including the self-organised complexity, 
stability, and diversity of the evolving agent populations within. 

• We have extended the statistical physics based definition of Physical Complexity, to 
include evolving agent populations. This required extending definitions for populations 
of variable length sequences, creating a measure for the efficiency of information storage, 
and an understanding of clustering within populations to support the non-atomicity of 
agents. 

• We have extended the Chli-DeWilde definition of agent stability to include the evolu- 
tionary dynamics of evolving agent populations. We then built upon this to construct an 
entropy-based definition for the degree of instability, which was used to study the stability 
of evolving agent populations under varying conditions. 

• We have developed an understanding and definition for the self-organised diversity, finding 
no existing definition suitable because of the unique hybrid nature of Digital Ecosystems. 
We therefore considered the global distribution of the agents in the populations relative 
to the varying requirements of the user base. 

• We have investigated alternative augmentations to optimise and accelerate our Digital 
Ecosystems, studying the accelerating effect of a clustering catalyst on the evolutionary 
dynamics, and the optimising effect of targeted migration on the ecological dynamics. 
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1.4 Dissertation Outline 



In Chapter 2, we explain the hybrid model created to provide the digital counterpart of a 
biological ecosystem. We start with the relevant theory from the domain of theoretical biology, 
including the fields of evolutionary and ecological theory, and from the domain of computer 
science, including the fields of Multi-Agent Systems, evolutionary computing and Service- 
Oriented Architectures. The Digital Ecosystem is then measured experimentally through 
simulations, with measures originating from theoretical ecology, to evaluate its likeness to a 
biological ecosystem. This included its responsiveness to requests for applications from the 
user base, as a measure of the ecological succession (ecosystem maturity). 

Chapter 3 investigates the self-organising aspects of Digital Ecosystems. We start with 
the complexity of the evolving agent populations within, by extending the statistical physics 
based definition of Physical Complexity to support variable length populations of software 
agents. Next, we investigate the stability of the evolving agent populations, by extending 
the Chli-De Wilde definition of agent stability to include the evolutionary dynamics of Digital 
Ecosystems. Finally, we study the diversity of the agents within the evolving agent populations 
of the Digital Ecosystem, for optimality relative to the environment provided by the user base. 

In Chapter 4, we start by considering alternative augmentations to optimise and accelerate 
Digital Ecosystems. We then further investigate the most promising, the clustering catalyst and 
targeted migration: the accelerating effect of a clustering catalyst on the evolutionary dynamics 
of our Digital Ecosystem, through the direct acceleration of the evolutionary processes; and 
the optimising effect of targeted migration on the ecological dynamics of our Digital Ecosystem, 
through the indirect and emergent optimisation of the agent migration patterns. 

Chapter 5 provides a summary of the conclusions, and suggests possible future research into 
Digital Ecosystems. We also report on the status of the reference implementation for Digital 
Ecosystems, and the dedicated simulation framework created for its future study. After this 
the Bibliography follows. 
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Chapter 2 

Creation of Digital Ecosystems 



In this chapter we create Digital Ecosystems, starting with a discussion of the relevant literature, 
including Nature Inspired Computing as a framework in which to understand this work, 
and the process of biomimicry to be used in mimicking the necessary biological processes to 
create Digital Ecosystems. We then consider the relevant theoretical ecology in creating the 
digital counterpart of a biological ecosystem, including the topological structure of ecosystems, 
and evolutionary processes within distributed environments. This leads to a discussion of 
the relevant fields from computer science for the creation of Digital Ecosystems, including 
evolutionary computing, Multi-Agent Systems, and Service-Oriented Architectures. We then 
define Ecosystem-Oriented Architectures for the creation of Digital Ecosystems, imbibed with 
the properties of self-organisation, scalability and sustainability from biological ecosystems, 
including a novel form of distributed evolutionary computing. This will include a discussion of 
the compromises resulting from the hybrid model created, such as the network topology. We 
then performed simulations to compare the likeness of our Digital Ecosystem with biological 
ecosystems, starting with ecological succession (development), measured by its responsiveness to 
requests for applications from the user base, and followed by the measures of species abundance 
and the species-area relationship, which are commonly applied to biological ecosystems. Finally, 
we conclude with a summary and discussion of the achievements, including the experimental 
results. 
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Chapter 2. Creation of Digital Ecosystems 



2.1 Background Theory 



In this section we discuss the relevant background theory, and because of the interdisciplinary 
nature of our research it will cover several fields across different domains. We start with an 
introduction to Nature Inspired Computing, followed by the relevant theoretical biology and 
computer science. With the theoretical biology, we will consider how properties of biological 
ecosystems influence functions that are relevant to developing Digital Ecosystems to solve 
practical problems. This leads us to suggest ways in which concepts from ecology can be used 
in biologically inspired techniques to create Digital Ecosystems. 



2.1.1 Existing Digital Ecosystems 



Our focus is in creating the digital counterpart of biological ecosystems. However, the term 
digital ecosystem has described a variety of concepts, which we shall now review. Sometimes 
referring to the existing networking infrastructure of the Internet [79, 27, 94, 337], while several 
companies offer a digital ecosystem service, which involves enabling customers to use existing 
e-business solutions [32, 160, 315]. The term is also being increasingly linked to the future 
developments of Information and Communications Technology (ICT) adoption for e-business, 
to support business ecosystems [214]. However, perhaps the most frequent references to digital 
ecosystems arise in Artificial Life research, where they are created primarily to investigate 
aspects of biological and other complex systems [295, 114, 55]. 

The extent to which these disparate systems resemble biological ecosystems varies, and 
frequently the word ecosystem is merely used for branding purposes without any inherent 
ecological properties. We consider Digital Ecosystems to be software systems that exploit 
the properties of biological ecosystems, and suggest that several key features of biological 
ecosystems have not been fully explored in existing digital ecosystems. So, we will now discuss 
how mimicking these features can create Digital Ecosystems, which are robust, scalable, and 
self-organising. 
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2.1.2 Nature-Inspired Computing 



Biomimicry (bios, meaning life, and mimesis, meaning to imitate) is the science that studies 
nature, its models, systems, processes, and elements, and then imitates or takes creative 
inspiration from them for the study and design of engineering systems and modern technology 
[33] . This concept is far from new, with humans having long been inspired by the animals and 
plants of the natural world; Leonardo Da Vinci himself once said, Those who are inspired by 
a model other than Nature, a mistress above all masters, are labouring in vain [40]. Albeit 
overstating the point, it reminds us that the transfer of technology between life-forms and 
synthetic constructs is desirable because evolutionary pressures typically force living organisms 
to become highly optimised and efficient. A classical example is the development of dirt and 
water repellent paint from the observation that the surface of the lotus flower plant is practically 
non-sticky for anything, commonly known as the lotus effect [25]. However, biomimicry, when 
done well, is not slavish imitation; it is inspiration using the principles which nature has 
demonstrated to be successful design strategies. For example, in the early days of mechanised 
flight the best designs were not the ornithopters, which most completely imitated birds, but 
the fixed- wing craft that used the principle of aerofoil cross-section in their wings [10]. 

Biomimicry in computer science is called Nature Inspired Computing (NIC) or Natural 
Computation, and the benefits of natural computation technologies often mimic those found in 
real natural systems, and include flexibility, adaptability, robustness, and decentralised control 
[73] . The increasing demands upon current computer systems, along with technological changes, 
create a need for more flexible and adaptable systems. The desire to achieve this has led many 
computing researchers to look to natural systems for inspiration in the design of computer 
software and hardware, as natural systems provide many examples of the type of versatile 
system required [73] . Their sources of inspiration come from many aspects of natural systems; 
evolution, ecology, development, cell and molecular phenomena, behaviour, cognition, and other 
areas [195] . The use of nature inspired techniques often results in the design of novel computing 
systems with applicability in many different areas [195]. NIC itself can be divided into three 
main branches [73]: 
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• Biologically-Inspired Computing (BIC): This makes use of nature as inspiration for the 
development of problem solving techniques. The main idea of this branch is to develop 
computational tools (algorithms) by taking inspiration from nature for the solution of 
complex problems. 

• The simulation and emulation of nature by computational means: This is basically 
a synthetic process aimed at creating patterns, forms, behaviours, and organisms 
that resemble life-as-we-know-it. Its products can be used to mimic various natural 
phenomena, thus increasing our understanding of nature and insights about computer 
models. 

• Computing with natural materials: This corresponds to the use of natural materials to 
perform computation, to substitute or supplement the current silicon-based computers. 

All branches share the common characteristic of human-designed computing inspired by nature, 
the metaphorical use of concepts, principles, and mechanisms underlying natural systems. Thus, 
evolutionary algorithms use the concepts of mutation, recombination, and natural selection 
from biology; neural networks are inspired by the highly interconnected neural structures in 
the brain and the nervous system; molecular computing is based on paradigms from molecular 
biology; and quantum computing based on quantum physics exploits quantum parallelism [73]. 
There are however, important methodological differences between various sub-areas of natural 
computing. For example, evolutionary algorithms and algorithms based on neural networks are 
presently implemented on conventional computers. However, molecular computing also aims 
at alternatives for silicon hardware by implementing algorithms in biological hardware, using 
DNA molecules and enzymes. Also, quantum computing aims at non-traditional hardware that 
can make use of quantum effects [73]. 

We are concerned with BIC, which relies heavily on the fields of biology, computer science, and 
mathematics. Briefly put, it is the study of nature to improve the usage of computers [99], 
and should not to be confused with computational biology [326], which is an interdisciplinary 
field that applies the techniques of computer science, applied mathematics, and statistics to 
address problems inspired by biology. BIC has produced Neural Networks, swarm intelligence 
and evolutionary computing [99]. Introducing BIC, one comes quickly to its applications, partly 
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Figure 2.1: Biomimicry Design Spiral (modified from [130]): The process of biomimicry starts 
with identifying some behaviour from a biological system, which would appear to be useful. 
Followed by observation to understand the mechanisms or principles by which it operates, and 
therefore allowing for an abstract understanding of the behaviour. This can then be mimicked 
in a non-biological system and its performance and effectiveness evaluated [130]. 

because this is the essence of the approach, and partly because biomimicry as a process tends 
to be un- formalised and ad hoc [130] . It generally involves an engineer or scientist observing or 
being aware of an area of biological study, which seems applicable to a technology or research 
problem they are currently tackling, or which inspires the creation of a new technology [73]. 
However, there are some common steps in this process, which starts with identifying some 
behaviour from a biological system, which would appear to be useful. Followed by observation 
to understand the mechanisms or principles by which it operates, and therefore allowing for an 
abstract understanding of the behaviour. This can then be mimicked in a non-biological system 
and its performance and effectiveness evaluated [130]. This process is summarised Figure 2.1. 

2.1.3 Biology of Digital Ecosystems 

Natural science is the study of the universe via the rules or laws of natural order, and the 
term is also used to differentiate those fields using scientific method in the study of nature, 
in contrast with the social sciences which apply the scientific method to culture and human 
behaviour: economics, psychology, political economy, anthropology, etc [135]. The fields of 
natural science are diverse, ranging from particle physics to astronomy [273] , and while not all 



26 



Chapter 2. Creation of Digital Ecosystems 



these fields of study will provide paradigms for Digital Ecosystems, the further one wishes to 
take the analogy of the word ecosystem, the more one has to consider the relevance of the fields 
of natural science, particularly the biological sciences. 

A primary motivation for our research in Digital Ecosystems is the desire to exploit the self- 
organising properties of biological ecosystems. Ecosystems are thought to be robust, scalable 
architectures that can automatically solve complex, dynamic problems [173]. However, the 
biological processes that contribute to these properties have not been made explicit in Digital 
Ecosystems research. Here, we discuss how biological properties contribute to the self-organising 
features of biological ecosystems, including population dynamics, evolution, a complex dynamic 
environment, and spatial distributions for generating local interactions [309]. The potential for 
exploiting these properties in artificial systems is then considered. We suggest that several key 
features of biological ecosystems have not been fully explored in existing digital ecosystems, and 
discuss how mimicking these features may assist in developing robust, scalable self-organising 
architectures. 

Evolutionary computing uses natural selection to evolve solutions [110]; it starts with a set of 
possible solutions chosen arbitrarily, then selection, replication, recombination, and mutation 
are applied iteratively. Selection is based on conforming to a fitness function which is determined 
by a specific problem of interest, and so over time better solutions to the problem can thus evolve 
[110]. As Digital Ecosystems will likely solve problems by evolving solutions, they will probably 
incorporate some form of evolutionary computing. However, we suggest that Digital Ecosystems 
should also incorporate additional features, providing it with a closer resemblance to biological 
ecosystems. Including features such as complex dynamic fitness functions, a distributed or 
network environment, and self-organisation arising from interactions among organisms and 
their environment, such as those that we will now discuss. 

Arguably the most fundamental differences between biological and digital ecosystems lie in the 
motivation and approach of their respective researchers. Biological ecosystems are ubiquitous 
natural phenomena whose maintenance is crucial to our survival [20], developing through the 
process of ecological succession [29]. In contrast, Digital Ecosystems will be defined here as a 
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technology engineered to serve specific human purposes, developing to solve dynamic problems 
in parallel with high efficiency. 

2.1.3.1 Biological Ecosystems 



Figure 2.2: Ecosystem Structure (redrawn from [261]): A stable, self-perpetuating system 
made up of one or more communities of organisms, consisting of species in their habitats, with 
their populations existing in their respective micro-habitats [29]. A community is a naturally 
occurring group of populations from different species that live together, and interact as a self- 
contained unit in the same habitat. A habitat is a distinct part of the environment [29]. 

An ecosystem is a natural unit made up of living (biotic) and non-living (abiotic) components, 
from whose interactions emerge a stable, self-perpetuating system. It is made up of one or 
more communities of organisms, consisting of species in their habitats, with their populations 
existing in their respective micro-habitats [29]. A community is a naturally occurring group 
of populations from different species that live together, and interact as a self-contained unit 
in the same habitat. A habitat is a distinct part of the environment [29], for example, a 
stream. Individual organisms migrate through the ecosystem into different habitats competing 
with other organisms for limited resources, with a population being the aggregate number of 
the individuals, of a particular species, inhabiting a specific habitat or micro-habitat [29]. A 
micro-habitat is a subdivision of a habitat that possesses its own unique properties, such as 
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a micro-climate [168]. Evolution occurs to all living components of an ecosystem, with the 
evolutionary pressures varying from one population to the next depending on the environment 
that is the population's habitat. A population, in its micro-habitat, comes to occupy a niche, 
which is the functional relationship of a population to the environment that it occupies. A 
niche results in the highly specialised adaptation of a population to its micro-habitat [168]. 

2.1.3.2 Fitness Landscapes and Agents 

As described above, an ecosystem comprises both an environment and a set of interacting, 
reproducing entities (or agents) in that environment; with the environment acting as a set of 
physical and chemical constraints on reproduction and survival [29]. These constraints can 
be considered in abstract using the metaphor of the fitness landscape, in which individuals 
are represented as solutions to the problem of survival and reproduction [335]. All possible 
solutions are distributed in a space whose dimensions are the possible properties of individuals. 
An additional dimension, height, indicates the relative fitness (in terms of survival and 
reproduction) of each solution. The fitness landscape is envisaged as a rugged, multidimensional 
landscape of hills, mountains, and valleys, because individuals with certain sets of properties 
are fitter than others [335], as visualised in Figure 2.3. 

In biological ecosystems, fitness landscapes are virtually impossible to identify. This is both 
because there are large numbers of possible traits that can influence individual fitness, and 
because the environment changes over time and space [29]. In contrast, within a digital 
environment, it is normally possible to specify explicitly the constraints that act on individuals 
in order to evolve solutions that perform better within these constraints. Within genetic 
algorithms, exact specification of a fitness landscape or function is common practice [110]. 
However, within a Digital Ecosystem the ideal constraints are those that allow solution 
populations to evolve to meet user needs with maximum efficiency, with the user needs changing 
from place to place and time to time. In this sense the fitness landscape of a Digital Ecosystem 
is complex and dynamic, and more like that of a biological ecosystem than like that of a 
traditional genetic algorithm [217, 110]. The designer of a Digital Ecosystem therefore faces a 
double challenge: firstly, to specify rules that govern the shape of the fitness function/landscape 
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Figure 2.3: Fitness Landscape (modified from [314]): We can represent software development as 
a walk through the landscape, towards the peaks which correspond to the optimal applications. 
Each point represents a unique combination of software services, and the roughness of the 
landscape indicates how difficult it is to reach an optimal software design [314]- In this example, 
there is a global optimum, and several lower local optima. 



in a way that meaningfully maps landscape dynamics to user requests, and secondly, to evolve 
within this space, solution populations that are diverse enough to solve disparate problems, 
complex enough to meet user needs, and efficient enough to be preferable to those generated 
by other means. 

The agents within a Digital Ecosystem will need to be like biological individuals in the sense 
that they reproduce, vary, interact, move, and die [29]. Each of these properties contributes 
to the dynamics of the ecosystem. However, the way in which these individual properties are 
encoded may vary substantially depending on the intended purpose of the system [49] . 

2.1.3.3 Networks and Spatial Dynamics 

A key factor in the maintenance of diversity in biological ecosystems is spatial interactions, and 

several modelling systems have been used to represent these spatial interactions, including 

metapopulations 1 , diffusion models, cellular automata and agent-based models (termed 

individual-based models in ecology) [116]. The broad predictions of these diverse models are 

1 A mctapopulation is a collection of relatively isolated, spatially distributed, local populations bound together 
by occasional dispersal between populations [175, 127, 128]. 
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Figure 2.4: Abstract View of An Ecosystem: Showing different populations (by the different 
colours) in different spatial areas, and their connection to one another by the lines. Included are 
communities of populations that have become geographically separated and so are not connected 
to the main network of the ecosystem, and which could potentially give rise to allopatric 
(geographic) speciation [168]. 



in good agreement. At local scales, spatial interactions favour relatively abundant species 
disproportionately. However, at a wider scale, this effect can preserve diversity, because different 
species will be locally abundant in different places. The result is that even in homogeneous 
environments, population distributions tend to form discrete, long-lasting patches that can 
resist an invasion by superior competitors [116]. Population distributions can also be influenced 
by environmental variations such as barriers, gradients, and patches. The possible behaviour 
of spatially distributed ecosystems is so diverse that scenario-specific modelling is necessary to 
understand any real system [119]. Nonetheless, certain robust patterns are observed. These 
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include the relative abundance of species, which consistently follows a roughly log-normal 
relationship [30], and the relationship between geographic area and the number of species 
present, which follows a power law [288]. The reasons for these patterns are disputed, because 
they can be generated by both spatial extensions of simple Lotka-Volterra competition models 
[136], and more complex ecosystem models [293]. 

Landscape connectivity plays an important part in ecosystems. When the density of habitats 
within an environment falls below a critical threshold, widespread species may fragment into 
isolated populations. Fragmentation can have several consequences. Within populations, these 
effects include loss of genetic diversity and detrimental inbreeding [115]. At a broader scale, 
isolated populations may diverge genetically, leading to speciation, as shown in Figure 2.4. 

From an information theory perspective, this phase change in landscape connectivity can 
mediate global and local search strategies [118]. In a well-connected landscape, selection favours 
the globally superior, and pursuit of different evolutionary paths is discouraged, potentially 
leading to premature convergence. When the landscape is fragmented, populations may 
diverge, solving the same problems in different ways. Recently, it has been suggested that the 
evolution of complexity in nature involves repeated landscape phase changes, allowing selection 
to alternate between local and global search [117]. 

In a digital context, we can have spatial interactions by using a distributed system that consists 
of a set of interconnected locations, with agents that can migrate between these connected 
locations. In such systems the spatial dynamics are relatively simple compared with those 
seen in real ecosystems, which incorporate barriers, gradients, and patchy environments at 
multiple scales in continuous space [29]. Nevertheless, depending on how the connections 
between locations are organised, such Digital Ecosystems might have dynamics closely parallel 
to spatially explicit models, diffusion models, or metapopulations [119]. We will discuss later 
the use of a dynamic non-geometric spatial network, and the reasons for using this approach. 

2.1.3.4 Selection and Self-Organisation 

The major hypothetical advantage of Digital Ecosystems over other complex organisational 
models is their potential for dynamic adaptive self-organisation. However, for the solutions 



32 



Chapter 2. Creation of Digital Ecosystems 




Figure 2.5: Evolving Population of Digital Organisms: A virtual petri dish at three successive 
time-steps, showing the self- organisation of the population undergoing selection. The colour 
shows the genetic variability of the digital organisms. Over time the fitter (purple) organisms 
come to dominate the population, reproducing more and essentially replacing the weaker 
organisms of the population [247]. 



evolving in Digital Ecosystems to be useful, they must not only be efficient in a computational 
sense, but they must also solve purposeful problems. That is, the fitness of agents must translate 
in some sense to real- world usefulness as demanded by the users [85]. 

Constructing a useful Digital Ecosystem therefore requires a balance between freedom of the 
system to self-organise, and constraint of the system to generate useful solutions. These factors 
must be balanced because the more the system's behaviour is dictated by its internal dynamics, 
the less it may respond to fitness criteria imposed by the users. At one extreme, when system 
dynamics are mainly internal, agents may evolve that are good at survival and reproduction 
within the digital environment, but useless in the real world [85]. At the other extreme, 
where the users' fitness criteria overwhelmingly dictates function, we suggest that dynamic 
exploration, of the solution space and complexity, is likely to be limited. The reasoning behind 
this argument is as follows. Consider a multidimensional solution space which maps to a rugged 
fitness landscape [335]. In this landscape, competing solution lineages will gradually become 
extinct through chance processes. So, the solution space explored becomes smaller over time as 
the population adapts and the diversity of solutions decreases. Ultimately, all solutions may be 
confined to a small region of the solution space. In a static fitness landscape, this situation is 
desirable because the surviving solution lineages will usually be clustered around an optimum 
[110]. However, if the fitness landscape is dynamic, the location of optima varies over time, 
and should lineages become confined to a small area of the solution space, then subsequent 
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selection will locate only optima that are near this area [217]. This is undesirable if new, higher 
optima arise that are far from pre-existing ones. A related issue is that complex solutions 
are less likely to be found by chance than simple ones. Complex solutions can be visualised 
as sharp, isolated peaks on the fitness landscape. Especially for dynamic landscapes, these 
peaks are most likely to be found when the system explores the solution space widely [217]. 
Therefore, a self-organising mechanism other than the fitness criteria of users is required to 
maintain diversity among competing solutions in a Digital Ecosystem. 

2.1.3.5 Stability and Diversity in Complex Adaptive Systems 

Ecosystems are often described as Complex Adaptive Systems (CAS), because like them, they 
are systems made from diverse, locally interacting components that are subject to selection. 
Other CAS include brains, individuals, economies, and the biosphere. All are characterised by 
hierarchical organisation, continual adaptation and novelty, and non-equilibrium dynamics. 
These properties lead to behaviour that is non-linear, historically contingent, subject to 
thresholds, and contains multiple basins of attraction [173]. 

In the previous subsections, we have advocated Digital Ecosystems that include agent 
populations evolving by natural selection in distributed environments. Like real ecosystems, 
digital systems designed in this way fit the definition of CAS. The features of these systems, 
especially non-linearity and non-equilibrium dynamics, offer both advantages and hazards for 
adaptive problem-solving. The major hazard is that the dynamics of CAS are intrinsically 
hard to predict because of the non-linear emergent self-organisation [174]. This observation 
implies that designing a useful Digital Ecosystem will be partly a matter of trial and error. 
The occurrence of multiple basins of attraction in CAS suggests that even a system that 
functions well for a long period may suddenly at some point transition to a less desirable 
state [97]. For example, in some types of system self-organising mass extinctions might result 
from interactions among populations, leading to temporary unavailability of diverse solutions 
[230]. This concern may be addressed by incorporating negative feedback or other mechanisms 
at the global scale. The challenges in designing an effective Digital Ecosystem are mirrored by 
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the system's potential strengths. Non-linear behaviour provides the opportunity for scalable 
organisation and the evolution of complex hierarchical solutions, while rapid state transitions 
potentially allow the system to adapt to sudden environmental changes with minimal loss of 
functionality [173]. 




Figure 2.6: Ecosystems as Complex Adaptive Systems (modified from [9]): (LEFT) An abstract 
view of an ecosystem showing the diversity of different populations by the different colours and 
spacing. (RIGHT) An abstract view of diversity within a population, with the space between 
points showing genetic diversity and the clustering prevalent. 



A key question for designers of Digital Ecosystems is how the stability and diversity properties 
of biological ecosystems map to performance measures in digital systems. For a Digital 
Ecosystem the ultimate performance measure is user satisfaction, a system-specific property. 
However, assuming the motivation for engineering a Digital Ecosystem is the development of 
scalable, adaptive solutions to complex dynamic problems, certain generalisations can be made. 
Sustained diversity [97], is a key requirement for dynamic adaptation. In Digital Ecosystems, 
diversity must be balanced against adaptive efficiency because maintaining large numbers of 
poorly-adapted solutions is costly. The exact form of this trade-off will be guided by the specific 
requirements of the system in question. Stability [173], is likewise, a trade-off: we want the 
system to respond to environmental change with rapid adaptation, but not to be so responsive 
that mass extinctions deplete diversity or sudden state changes prevent control. 
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2.1.4 Computing of Digital Ecosystems 

Based on the understanding of biological ecosystems, from the theoretical biology of the previous 
subsection, we will now introduce fields from the domain of computer science relevant in the 
creation of Digital Ecosystems. As we are interested in the digital counterparts for the behaviour 
and constructs of biological ecosystems, instead of simulating or emulating such behaviour or 
constructs, we will consider what parallels can be drawn. 

The value of creating parallels between biological and computer systems varies substantially 
depending on the behaviours or constructs being compared, and sometimes cannot be done 
so convincingly. For example, both have mechanisms to ensure data integrity. In computer 
systems, that integrity is absolute, data replication which introduces even the most minor 
change is considered to have failed, and is supported by mechanisms such as the Message-Digest 
algorithm 5 [266]. While in biological systems, the genetic code is transcribed with a remarkable 
degree of fidelity; there is, approximately, only one unforced error per one hundred bases copied 
[202]. There are also elaborate proof-reading and correction systems, which in evolutionary 
terms are highly conserved [202]. In this example establishing a parallel is infeasible, despite 
the relative similarity in function, because the operational control mechanisms in biological and 
computing systems are radically different, as are the aims and purposes. This is a reminder 
that considerable finesse is required when determining parallels, or when using existing ones. 

We will start by considering Multi-Agent Systems to explore the references to agents and 
migration; followed by evolutionary computing and Service-Oriented Architectures for the 
references to evolution and self-organisation. 



2.1.4.1 Multi-Agent Systems 

A software agent is a piece of software that acts, for a user in a relationship of agency, 
autonomously in an environment to meet its designed objectives [334]. A Multi- Agent System 
(MAS) is a system composed of several software agents, collectively capable of reaching goals 
that are difficult to achieve by an individual agent or monolithic system [334]. Conceptually, 
there is a strong parallel between the software agents of a MAS and the agent-based models of 
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Figure 2.7: Mobile Agent System: Visualisation that shows mobile agents as programmes 
that can migrate from one host to another in a network of heterogeneous computer systems 
and perform a task specified by its owner. On each host they visit, mobile agents need special 
software called an agent station, which is responsible for executing the agents and providing a 
safe execution environment [200]. 



a biological ecosystem [116], despite the lack of evolution and migration in a MAS. There is an 
even stronger parallel to a variant of MASs, called mobile agent systems, in which the mobility 
also mirrors the migration in biological ecosystems [249]. 

The term mobile agent contains two separate and distinct concepts: mobility and agency [269]. 
Hence, mobile agents are software agents capable of movement within a network [249]. The 
mobile agent paradigm proposes to treat a network as multiple agent-friendly environments 
and the agents as programmatic entities that move from location to location, performing tasks 
for users. So, on each host they visit mobile agents need software which is responsible for their 
execution, providing a safe execution environment [249]. 

Generally, there are three types of design for mobile agent systems [249]: (1) using a specialised 
operating system, (2) as operating system services or extensions, or (3) as application software. 
The first approach has the operating system providing the requirements of mobile agent systems 
directly [301]. The second approach implements the mobile agent system requirements as 
operating system extensions, taking advantage of existing features of the operating system [146] . 
Lastly, the third approach builds mobile agent systems as specialised application software that 
runs on top of an operating system, to provide for the mobile agent functionality, with such 
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software being called an agent station [200]. In this last approach, each agent station hides 
the vendor-specific aspects of its host platform, and offers standardised services to visiting 
agents. Services include access to local resources and applications; for example, web servers 
or web services, the local exchange of information between agents via message passing, basic 
security services, and the creation of new agents [200]. Also, the third approach is the most 
platform-agnostic, and is visualised in Figure 2.7. 

2.1.4.2 Evolutionary Computing 

For evolving software in Digital Ecosystems evolutionary computing is the logical field from 
which to start. In Biologically-Inspired Computing, one of the primary sources of inspiration 
from nature has been evolution [195]. Evolution has been clearly identified as the source 
of many diverse and creative solutions to problems in nature [67, 104]. However, it can 
also be useful as a problem-solving tool in artificial systems. Computer scientists and other 
theoreticians realised that the selection and mutation mechanisms that appear so effective 
in biological evolution could be abstracted to be implemented in a computational algorithm 
[195]. Evolutionary computing is now recognised as a sub-field of artificial intelligence (more 
particularly computational intelligence) that involves combinatorial optimisation problems [14]. 

Evolutionary algorithms are based upon several fundamental principles from biological 
evolution, including reproduction, mutation, recombination (crossover), natural selection, and 
survival of the fittest. As in biological systems, evolution occurs by the repeated application of 
the above operators [13]. An evolutionary algorithm operates on a set of individuals, called a 
population. An individual, in the natural world, is an organism with an associated fitness [168]. 
Candidate solutions to an optimisation problem play the role of individuals in a population, and 
a cost function determines the environment within which the solutions live, analogous to the 
way the environment selects for the fittest individuals. Candidate solutions to an optimisation 
problem play the role of individuals in a population, and a cost function determines the 
environment by selecting for the fittest individuals. The number of individuals varies between 
different implementations and may also vary through time during the use of the algorithm. 
Each individual possesses some characteristics that are defined through its genotype, its genetic 
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composition. These characteristics may be passed on to descendants of that individual [13]. 
Processes of mutation (small random changes) and crossover (generation of a new genotype by 
the combination of components from two individuals) may occur, resulting in new individuals 
with genotypes different from the ancestors they will come to replace. These processes iterate, 
modifying the characteristics of the population [13]. Which members of the population are 
kept, or are used as parents for offspring, will often depend upon some external characteristic, 
called the fitness (cost) function of the population. It is this that enables improvement to occur 
[13], and corresponds to the fitness of an organism in the natural world [168]. Recombination 
and mutation create the necessary diversity and thereby facilitate novelty, while selection acts 
as a force increasing quality. Changed pieces of information resulting from recombination and 
mutation are randomly chosen. However, selection operators can be either deterministic, or 
stochastic. In the latter case, individuals with a higher fitness have a higher chance to be 
selected than individuals with a lower fitness [13]. 

There are different strands of what has become called evolutionary computing [13]. The first 
is genetic algorithms. A second strand, evolution strategies, focuses strongly on engineering 
applications. A third strand, evolutionary programming, originally developed from machine 
intelligence motivations, and is related to the other two. These areas developed separately 
for about fifteen years, but from the early nineties they are seen as different representatives 
(dialects) of one technology, called evolutionary computing [90]. In the early nineties, another 
fourth stream following the general ideas had emerged, called genetic programming [90] . 

Genetic algorithms [110] implement a population of individuals, each of which possesses a 
genotype that encodes a candidate solution to a problem. Typically genotypes are encoded 
as bit-strings, but other encodings have been used in more recent developments of genetic 
algorithms. Mutation and crossover, along with selection, are then used to choose a solution to 
a problem. They have proven to be widely applicable, and have resulted in many applications 
in differing domains [212]. Evolutionary strategies arose out of an attempt by several civil 
engineers to understand a problem in hydrodynamics [74]. Evolutionary strategies [278] differ 
from genetic algorithms in operating on real-valued parameters, and historically they have 
tended not to use crossover as a variational operator, only mutation. However, mutation 
rates have themselves been allowed to adapt in evolutionary strategies, which is not often 
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the case with genetic algorithms. Evolutionary strategies have also been used for many 
applications [88]. 

Evolutionary programming arose distinctly from the first two strands of evolutionary computa- 
tion, out of an attempt to understand machine intelligence through the evolution of finite state 
machines [96]. Evolutionary programming [252] emphasises the evolution of the phenotype 
(instance of a solution) instead of the genotype (genetic material) of individuals, and the 
relation between the phenotype of parents and offspring, although crossover is not used. Thus, 
evolutionary programming has some differences in approach from the other major strands of 
evolutionary computation research. However, there have been many overlaps between the 
different fields and it too has been applied in many areas [252]. 

Genetic programming [156] can be considered as a variant of genetic algorithms where individual 
genotypes are represented by executable programmes. Specifically, solutions are represented 
as trees of expressions in an appropriate programming language, with the aim of evolving the 
most effective programme for solving a particular problem. Genetic programming, although 
the newest form of evolutionary computing, has still proved to be widely applicable [22]. 

Many important questions remain to be answered in understanding the performance of 
evolutionary algorithms. For example, current evolutionary algorithms for evolving programmes 
(genetic programming) suffer from some weaknesses. First, while being moderately successful 
at evolving simple programmes, it is very difficult to scale them to evolve high-level software 
components [191]. Second, the estimated fitness of a programme is normally given by a measure 
of how accurately it computes a given function, as represented by a set of input and output 
pairs, and therefore there is only a limited guarantee that the evolved programme actually 
does the intended computation [191]. These issues are particularly important when evolving 
high-level, complex, structured software. 

To evolve high-level software components in Digital Ecosystems, we propose taking advantage 
of the native method of software advancement, human developers, and the use of evolutionary 
computing for combinatorial optimisation [240] of the available software services. This involves 
treating developer-produced software services as the functional building blocks, as the base 
unit in a genetic-algorithms-based process. Such an approach would require a modular reusable 
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paradigm to software development, such as Service-Oriented Architectures, which are discussed 
in the following subsection. 

2.1.4.3 Service-Oriented Architectures 

Our approach to evolving high-level software applications requires a modular reusable paradigm 
to software development. Service- Oriented Architectures (SOAs) are the current state-of-the- 
art approach, being the current iteration of interface/component-based design from the 1990s, 
which was itself an iteration of event-oriented design from the 1980s, and before then modular 
programming from the 1970s [44, 158]. Service-oriented computing promotes assembling 
application components into a loosely coupled network of services, to create flexible, dynamic 
business processes and agile applications that span organisations and computing platforms 
[243]. This is achieved through a SOA, an architectural style that guides all aspects of 
creating and using business processes throughout their life-cycle, packaged as services. This 
includes defining and provisioning infrastructure that allows different applications to exchange 
data and participate in business processes, loosely coupled from the operating systems and 
programming languages underlying the applications [228]. Hence, a SOA represents a model in 
which functionality is decomposed into distinct units (services), which can be distributed over 
a network, and can be combined and reused to create business applications [243]. 

A SOA depends upon service-orientation as its fundamental design principle. In a SOA 
environment, independent services can be accessed without knowledge of their underlying 
platform implementation [228]. Services reflect a service- oriented approach to programming 
that is based on composing applications by discovering and invoking network-available services 
to accomplish some task. This approach is independent of specific programming languages or 
operating systems, because the services communicate with each other by passing data from 
one service to another, or by co-ordinating an activity between two or more services [243]. So, 
the concepts of SOAs are often seen as built upon, and the development of, the concepts of 
modular programming and distributed computing [158]. 

SOAs allow for an information system architecture that enables the creation of applications 
that are built by combining loosely coupled and interoperable services [228]. They typically 



2.1. Background Theory 



41 



implement functionality most people would recognise as a service, such as filling out an online 
application for an account, or viewing an online bank statement [158]. Services are intrinsically 
unassociated units of functionality, without calls to each other embedded in them. Instead 
of services embedding calls to each other in their source code, protocols are defined which 
describe how services can talk to each other, in a process known as orchestration, to meet new 
or existing business system requirements [287]. This is allowing an increasing number of third- 
party software companies to offer software services, such that SOA systems will come to consist 
of such third-party services combined with others created in-house, which has the potential 
to spread costs over many users and uses, and promote standardisation both in and across 
industries [51]. For example, the travel industry now has a well-defined, and documented, set 
of both services and data, sufficient to allow any competent software engineer to create travel 
agency software using entirely off-the-shelf software services [155, 46]. Other industries, such 
as the finance industry, are also making significant progress in this direction [341]. 

The vision of SOAs assembling application components from a loosely coupled network 
of services, that can create dynamic business processes and agile applications that span 
organisations and computing platforms, is visualised in Figure 2.8. It will be made possible 
by creating compound solutions that use internal organisational software assets, including 
enterprise information and legacy systems, and combining these solutions with external 
components residing in remote networks [242]. The great promise of SOAs is that the marginal 
cost of creating the n-th application is virtually zero, as all the software required already exists 
to satisfy the requirements of other applications. Only their combination and orchestration are 
required to produce a new application [305, 213]. The key is that the interactions between the 
chunks are not specified within the chunks themselves. Instead, the interaction of services (all 
of whom are hosted by unassociated peers) is specified by users in an ad-hoc way, with the 
intent driven by newly emergent business requirements [176]. 

The pinnacle of SOA interoperability, is the exposing of services on the internet as web services 
[228]. A web service is a specific type of service that is identified by a Uniform Resource 
Identifier (URI), whose service description and transport utilise open Internet standards. 
Interactions between web services typically occur as Simple Object Access Protocol (SOAP) 
calls carrying extensible Markup Language (XML) data content. The interface descriptions 
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Figure 2.8: Service-Oriented Architectures: Abstract visualisations, with the first image showing 
the loosely joined services as cuboids, and the service orchestration as a polyhedron; and the 
second image showing their high interoperability and re-usability in forming applications, from 
the use of standardised interfaces and external service orchestration. 



of web services are expressed using the Web Services Definition Language (WSDL) [241]. 
The Universal Description Discovery and Integration (UDDI) standard defines a protocol for 
directory services that contain web service descriptions. UDDI enables web service clients to 
locate candidate services and discover their details. Service clients and service providers utilise 
these standards to perform the basic operations of SOAs [241]. Service aggregators can then 
use the Business Process Execution Language (BPEL) to create new web services by defining 
corresponding compositions of the interfaces and internal processes of existing services [241]. 

SOA services inter-operate based on a formal definition (or contract, e.g. WSDL) that is 
independent of the underlying platform and programming language. Service descriptions 
are used to advertise the service capabilities, interface, behaviour, and quality [241]. The 
publication of such information about available services provides the necessary means for 
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discovery, selection, binding, and composition of services [241]. The (expected) behaviour 
of a service during its execution is described by its behavioural description (for example, 
as a workflow process). Also, included is a quality of service (QoS) description, which 
publishes important functional and non-functional service quality attributes, such as service 
metering and cost, performance metrics (response time, for instance), security attributes, 
integrity (transactional), reliability, scalability, and availability [241]. Service clients (end-user 
organisations that use some service) and service aggregators (organisations that consolidate 
multiple services into a new, single service offering) utilise service descriptions to achieve their 
objectives [241]. One of the most important and continuing developments in SOAs is Semantic 
Web Services (SWS), which make use of semantic descriptions for service discovery, so that a 
client can discover the services semantically [254, 42]. 

There are multiple standards available and still being developed for SOAs [320], most notably of 
recent being REpresentational State Transfer (REST) [287]. The software industry now widely 
implements a thin SOAP/WSDL/UDDI veneer atop existing applications or components that 
implement the web services paradigm [242], but the choice of technologies will change with time. 
Therefore, the fundamentals of SOAs and its services are best defined generically, because SOAs 
are technology agnostic and need not be tied to a specific technology [243] . Within the current 
and future scope of the fundamentals of SOAs, there is clearly potential to evolve complex 
high-level software applications from the modular services of SOAs, instead of the instruction 
level evolution currently prevalent in genetic programming [157]. 

2.1.4.4 Distributed Evolutionary Computing 

Having previously introduced evolutionary computing, and the possibility of it occurring within 
a distributed environment, not unlike those found in mobile agent systems, leads us to consider 
a specialised form known as distributed evolutionary computing (DEC). The motivation for 
using parallel or distributed evolutionary algorithms is twofold: first, improving the speed of 
evolutionary processes by conducting concurrent evaluations of individuals in a population; 
second, improving the problem-solving process by overcoming difficulties that face traditional 
evolutionary algorithms, such as maintaining diversity to avoid premature convergence [219, 
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297]. The fact that evolutionary computing manipulates a population of independent solutions 
actually makes it well suited for parallel and distributed computation architectures [45]. There 
are several variants of distributed evolutionary computing, leading some to propose a taxonomy 
for their classification [235], with there being two main forms [45, 297]: 

• multiple-population/coarse-grained migration/island models 

• single-population/fine-grained diffusion/neighbourhood models 

In the coarse-grained island models [178, 45], evolution occurs in multiple parallel sub- 
populations (islands), each running a local evolutionary algorithm, evolving independently with 
occasional migrations of highly fit individuals among sub-populations. The core parameters for 
the evolutionary algorithm of the island-models are as follows [178]: 

• number of the sub-populations: 2, 3, 4, more 

• sub-population homogeneity 

— size, crossover rate, mutation rate, migration interval 

• topology of connectivity: ring, star, fully-connected, random 

• static or dynamic connectivity 

• migration mechanisms: 

— isolated/ synchronous / asynchronous 

— how often migrations occur 

— which individuals migrate 

Fine-grained diffusion models [189, 297] assign one individual per processor. A local 
neighbourhood topology is assumed, and individuals are allowed to mate only within their 
neighbourhood, called a deme 2 . The demes overlap by an amount that depends on their 
shape and size, and in this way create an implicit migration mechanism. Each processor 
runs an identical evolutionary algorithm which selects parents from the local neighbourhood, 
produces an offspring, and decides whether to replace the current individual with an offspring. 
However, even with the advent of multi-processor computers, and more recently multi-core 
processors, which provide the ability to execute multiple threads simultaneously [193], this 
approach would still prove impractical in supporting the number of agents necessary to create 
a Digital Ecosystem. Therefore, we shall further consider the island models. 

2 In biology a deme is a term for a local population of organisms of one species that actively interbreed with 
one another and share a distinct gene pool [76]. 
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Figure 2.9: Island- Model of Distributed Evolutionary Computing [178, 45]: There are different 
probabilities of going from island © to island ©, as there is of going from island © to island ®. 
This mirrors the naturally inspired quality that although two populations have the same physical 
separation, it may be easier to migrate in one direction than the other, i.e. fish migration is 
easier downstream than upstream. 

An example island-model [178, 45] is visualised in Figure 2.9, in which there are different 
probabilities of going from island ® to island ©, as there is of going from island © to island 
©. This allows maximum flexibility for the migration process, and mirrors the naturally 
inspired quality that although two populations have the same physical separation, it may be 
easier to migrate in one direction than the other, i.e. fish migration is easier downstream than 
upstream. The migration of the island models is like the notion of migration in nature, being 
similar to the metapopulation models of theoretical ecology [175]. This model has also been 
used successfully in the determination of investment strategies in the commercial sector, in a 
product known as the Galapagos toolkit [325, 56]. However, all the islands in this approach 
work on exactly the same problem, which makes it less analogous to biological ecosystems in 
which different locations can be environmentally different [29]. We will take advantage of this 
property later when defining the Ecosystem-Oriented Architecture of Digital Ecosystems. 



2.1.5 Digital Business Ecosystems 

The questions we have raised are wide-ranging, and are motivating several interdisciplinary 
research teams, including those involved in an EU Framework VI project called Digital Business 
Ecosystems (DBEs). The DBE is a proposed methodology for economic and technological 
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Figure 2.10: Business Ecosystem [222]: Conceptual visualisation [93] showing a Business 
Ecosystem of interacting Small and Medium sized Enterprise users, via the services they provide 
and consume. Creating a network of business ecosystems distributed over different geographical 
regions, business domains, and industry sectors. 



innovation. Specifically, the DBE is a software infrastructure for supporting large numbers of 
interacting business users and services [222]. The DBE aims to be a next generation Information 
and Communications Technology that will extend the Service-Oriented Architecture concept 
with the automatic combining of available and applicable services in a scalable architecture, to 
meet business user requests for applications that facilitate business processes. In essence, the 
DBE will be an internet-based environment in which businesses will be able to interact with 
each other in very effective and efficient ways [223]. 

The synthesis of the concept of Digital Business Ecosystems emerged by adding [221] digital 
in front of business ecosystem [214]. The term Digital Business Ecosystem was used earlier, 
but with a focus exclusively on developing countries [215]. The generalisation of the term 
to refer to a new interpretation of what socio-economic development catalysed by ICT means 
was new, emphasising the co-evolution between the business ecosystem and its partial digital 
representation: the digital ecosystem. The term Digital Business Ecosystem came to represent 
the combination of the two ecosystems [222]. 
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The business ecosystem is an economic community supported by a foundation of interacting 
organisations and individuals; i.e. the organisms of the business world. This economic 
community produces goods and services of value to customers, who are themselves members of 
the ecosystem [214]. A wealthy ecosystem sees a balance between co-operation and competition 
in a dynamic free market. Regarding a particular business ecosystem, two main different 
interpretations of its structure have been discussed in the literature. The keystone model 
has a structure in which a business ecosystem is dominated by a large firm that is surrounded 
by many small suppliers [138]. This model works well when the central firm is healthy, but 
represents a significant weakness for the economy of the region when the dominant economic 
actor experiences difficulties [214]. This model also matches the economic structure of the 
USA where there is a predominant number of large enterprises at the centre of large value 
networks of suppliers [138]. However, the model for a business ecosystem developed in Europe 
is less structured and more dynamic; it is composed mainly of Small and Medium sized 
Enterprises (SMEs), but can accommodate large firms [275]. All actors complement one 
another, leading to a more dynamic division of labour, organised along one-dimensional value 
chains and two-dimensional value networks [60]. This model is particularly well-adapted for 
the service and knowledge industries, where it is easier for small firms to reinvent themselves 
than, for instance, in the automotive industry which is dominated by large enterprises [222]. 

In the DBE, the digital ecosystem is the technical infrastructure, based on a peer-to-peer (P2P) 
distributed software technology that transports, finds, and connects services and information 
over Internet links enabling networked transactions, and the distribution of all the digital objects 
present within the infrastructure [222]. Such organisms of the digital world encompass any 
useful digital representations expressed by languages (formal or natural) that can be interpreted 
and processed (by computer software and/or humans), e.g. software applications, services, 
knowledge, taxonomies, folksonomies, ontologies, descriptions of skills, reputation and trust 
relationships, training modules, contractual frameworks, laws [222]. So, the Digital Business 
Ecosystem is a biological metaphor that highlights the interdependence of all actors in the 
business environment, who co-evolve their capabilities and roles [214], and which has attempted 
to develop an isomorphic model between biological behaviour and the behaviour of the digital 
ecosystem, leading to an evolutionary, self-organising, and self-optimising environment built 
upon an underlying Service-Oriented Architecture [223] . 

The DBE aims to help local economic actors become active players in globalisation [82], 
valorising their local culture and vocations, enabling them to interact and create value networks 
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at the global level. Increasingly this approach, dubbed glocalisation, is being considered a 
successful strategy of globalisation that preserves regional growth and identity [267, 303, 148], 
and has been embraced by the mayors and decision-makers of thousands of municipalities [109], 
because of the possible tension between globalisation and localisation when adopting ICTs [47] . 

The DBE represents a business-to-business (B2B) interaction concept supported by a software 
platform (digital ecosystem) that is intended to have the desirable properties of biologi- 
cal ecosystems [83], and its researchers also recognise the importance of Service-Oriented 
Architectures in creating Digital Ecosystems [260, 223]. So, we will consider using the concept 
of a business ecosystem as a potential user base for Digital Ecosystems. 

2.2 Ecosystem- Oriented Architectures 

We will now define the architectural principles of Digital Ecosystems. We will use our 
understanding of theoretical biology from section 2.1.3, mimicking the processes and structures 
of life, evolution, and ecology of biological ecosystems. We will achieve this by combining 
elements from mobile agents systems, distributed evolutionary computing, and Service-Oriented 
Architectures from section 2.1.4, to create a hybrid architecture which is the digital counterpart 
of biological ecosystems. 

We will refer to the agents of Digital Ecosystems as Agents, populations as Populations, and the 
habitats as Habitats, to distinguish their new hybrid definitions from their original biological 
and computing definitions. 

2.2.1 Agents 

The Agents of the Digital Ecosystem are functionally analogous to the organisms of biological 
ecosystems, including the behaviour of migration and the ability to be evolved [29] , and will be 
achieved through using a hybrid of different technologies. The ability to migrate is provided 
by using the paradigm of agent mobility from mobile agent systems [249], with the Habitats of 
the Digital Ecosystem provided by the facilities of agent stations from mobile agent systems 
[200], i.e. a distributed network of locations to migrate to and from. The Habitats, and the 
Habitat network will be discussed later. The ability of the Agents to be evolved is in two 
parts: first, by using the interoperability of services from Service-Oriented Architectures [228] 
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Figure 2.11: Agent of the Digital Ecosystem: A lightweight entity consisting primarily of 
a pointer to the semantic web service it represents, which is Service- Oriented Architectures 
compliant and therefore includes an executable component and semantic description. A software 
service can be a software service only, e.g. for data encryption, or a software service providing 
a front- end to a real-world service, e.g. selling books. 



to aggregate Agents; and second, the use of evolutionary computing [90] for combinatorial 
optimisation [240] at the Habitats to evolve optimal aggregations of Agents. The Agents will 
take advantage of the interoperability of Service-Oriented Architectures [228], by acting in a 
relationship of agency [334] to the user supplied semantic web services, which will be Service- 
Oriented Architecture compliant [241]. We can then evolve high-level software applications by 
using evolutionary computing [90] for combinatorial optimisation [240] of the available Agents, 
or rather the services they represent, in a genetic-algorithms-based [110] process. This makes 
an Agent, of the Digital Ecosystem, a lightweight entity consisting primarily of a pointer to the 
semantic web service it represents, including the service's executable component and semantic 
description. A software service can be a software service only, e.g. for data encryption, or a 
software service providing a front-end to a real-world service, e.g. selling books, as shown in 
Figure 2.11. 



An organism within Digital Ecosystems is an Agent, or an Agent aggregation created using 
evolutionary optimisation in response to a user request for an application. These Agents will 
migrate through the Habitat network of the Digital Ecosystem and adapt to find niches where 
they are useful in fulfilling other user requests for applications. The Agents interact, evolve, and 
adapt over time to the environment, thereby serving the ever-changing requirements imposed 
by the user base. 
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The executable component, of a semantic web service that an Agent represents, is equivalent to 
the DNA of an organism, whose sequence encodes the genetic information of living organisms 
and has two primary functions [168]: the holder of virtually all information in inheritance, 
and the controller of protein synthesis for the construction and operation of its organism. 
Equivalently, the executable component is also the inheritable component from one generation 
to the next, and defines the objects and behaviour of its service's run-time instantiation. 

The genotype of an individual describes the genetic constitution (DNA) of an individual, 
independent of its physical existence (the phenotype) [168]. Equivalently, the semantic 
description, of a semantic web service that an Agent represents, describes the functionality 
of the executable component. The phenotype of an individual arises from the combination of 
an organism's DNA and the environment [168]. Equivalently, the run-time instantiation, of a 
service that an Agent represents, results from instantiating the executable component in the 
run-time environment. This differentiation between genotype and phenotype is fundamental 
for escaping local optima, and is often lacking in artificial evolutionary systems [281], having 
instead a one-to-one genotype-phenotype mapping, in which the phenotype is directly encoded 
in the genotype with no differentiation provided by instantiation (development) [281]. Neutral 
genotype-phenotype mappings have this differentiation between the genotype and phenotype 
[285], which more strongly parallels biological evolution [21]. We therefore expect the use of a 
neutral genotype-phenotype mapping to help Digital Ecosystems demonstrate behaviour more 
akin to biological ecosystems. 

2.2.1.1 Agent Aggregation 

The executable component of a semantic web service that an Agent represents is equivalent 
to an organism's DNA and is the gene (functional unit) in the evolutionary process [168]. 
So, the Agents should be aggregated as a sequence, like the sequencing of genes in DNA 
[168]. It could be argued that the Agents should be aggregated as an unordered set, or, 
based on service orchestration, into a tree or workflow, as shown in Figure 2.12. However, the 
aggregated structure of the Agents should not be the orchestration structure of the collection 
of software services that the Agents represent, not only because the service orchestration of the 
run-time instantiation is application domain-specific (e.g. trees in supply chain management 
[163], workflows in the travel industry [31]), but because it would also move it undesirably 
towards a one-to-one genotype-phenotype mapping [281]. 
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Figure 2.12: Structure of Aggregated Agents: The executable component of a semantic web 
service that an Agent represents is equivalent to an organism's DNA and is the gene (functional 
unit) in the evolutionary process [168]. So, the Agents should be aggregated as a sequence, 
like the sequencing of genes in DNA [168]. Instead of an unordered set, or, based on service 
orchestration, into a tree or workflow. 

2.2.2 Habitats 

The Habitats are the nodes of the Digital Ecosystem, and are functionally analogous to 
the habitats of a biological ecosystem [168]. Their functionality is provided by using the 
agent stations from mobile agent systems [200] (to provide a distributed environment in 
which Agent migration can occur), with evolutionary computing [90] for the Agent interaction 
(instead of traditional agent interaction mechanisms [334]), and the island-model of distributed 
evolutionary computing [178] for the connectivity between Habitats. There will be a Habitat for 
each user, which the users will typically run locally, and through which they will submit requests 
for applications. Supporting this functionality, Habitats have the following core functions: 

• Provide a subset of the Agents and Agent-sequences available globally, relevant to the user 
that the Habitat represents, and stored in what we will call an Agent-pool (for reasons 
that will be explained later). 

• Accelerate, via the Agent-pool, the Populations instantiated to evolve optimal Agent- 
sequences in response to user requests for applications. 

• Manage the inter-Habitat connections for Agent migration. 

• For service providers; manage the distribution of Agents (which represent their services) 
to other users of the Digital Ecosystem, via the network of interconnected Habitats. 
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Figure 2.13: Habitat Network: Uses the agent stations from mobile agent systems [200] (to 
provide a distributed environment in which Agent migration can occur), with evolutionary 
computing [90] for the Agent interaction (instead of traditional agent interaction mechanisms 
[334]), and the island-model of distributed evolutionary computing [178] for the connectivity 
between Habitats. 

The collection of Agents at each Habitat (peer) will change over time, as the more successful 
Agents spread throughout the Digital Ecosystem, and as the less successful Agents are 
deleted. Successive user requests over time to their dedicated Habitats makes this process 
possible, because the continuous and varying user requests for applications provide a dynamic 
evolutionary pressure on the Agents, which have to evolve to better satisfy those requests. So, 
the Agents will recombine and evolve over time, constantly seeking to increase their effectiveness 
for the user base. The Agent is the base unit of the evolutionary process in Digital Ecosystems, 
in the same way that the gene is the base unit for evolution in biological ecosystems [29]. So, 
the collection of Agents at each Habitat provides an Agent-pool, similar to a gene-pool, which 
is all the genes in a population [168] . Additionally, it also stores Agent-sequences evolved from 
the Habitat's Populations, and Agent-sequences that migrate to the Habitat from other users' 
Habitats, because they can potentially accelerate future Populations instantiated to respond to 
user requests. 

The landscape, in energy-centric biological ecosystems, defines the connectivity between 
habitats [29]. Connectivity of nodes in the digital world is generally not defined by geography 
or spatial proximity, but by information or semantic proximity. For example, connectivity 
in a peer-to-peer network is based primarily on bandwidth and information content, and 
not geography. The island-models of distributed evolutionary computing use an information- 
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centric model for the connectivity of nodes (islands) [178]. However, because it is generally 
defined for one-time use (to evolve a solution to one problem and then stop) it usually has 
a fixed connectivity between the nodes, and therefore a fixed topology [45]. So, supporting 
evolution in the Digital Ecosystem, with a dynamic multi-objective selection pressure (fitness 
landscape [335] with many peaks), requires a re-configurable network topology, such that 
Habitat connectivity can be dynamically adapted based on the observed migration paths of 
the Agents between the users within the Habitat network. So, based on the island-models of 
distributed evolutionary computing [178], each connection between the Habitats is bi-directional 
and there is a probability associated with moving in either direction across the connection, with 
the connection probabilities affecting the rate of migration of the Agents. However, additionally, 
the connection probabilities will be updated by the success or failure of Agent migration using 
the concept of Hebbian learning [132]: the Habitats which do not successfully exchange Agents 
will become less strongly connected, and the Habitats which do successfully exchange Agents 
will achieve stronger connections. This leads to a topology that adapts over time, resulting in a 
network that supports and resembles the connectivity of the user base. When we later consider 
an example user base, we will further discuss a resulting topology. 

When a new user joins the Digital Ecosystem, a Habitat needs to be created for them, and 
most importantly connected to the correct cluster(s) in the Habitat network. A new user's 
Habitat can be connected randomly to the Habitat network, as it will dynamically reconnect 
based upon the user's behaviour. User profiling can also be used to help connect a new user's 
Habitat to the optimal part of the network, by finding a similar user or asking the user to 
identify a similar user, and then cloning their Habitat's connections. Also, when a new Habitat 
is created, its Agent-pool should be created by merging the Agent-pools of the Habitats to 
which it is initially connected. 

2.2.2.1 Agent Migration 

The Agents migrate through the interconnected Habitats combining with one another in 
Populations to meet user requests for applications. The migration path from the current Habitat 
is dependent on the migration probabilities between the Habitats. The migration of an Agent 
within the Digital Ecosystem is initially triggered by deployment to its user's Habitat, for 
distribution to other users who will potentially make use of the service the Agent represents. 
When a user deploys a service, its representative Agent must be generated and deployed to 
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their Habitat. It is then copied to the Agent-pool of the user's Habitat, and from there the 
migration of the Agent occurs, which involves migrating (copying) the agent probabilistically 
to all the connected Habitats. The Agent is copied rather than moved, because the Agent may 
also be of use to the providing user. The copying of an Agent to a connected Habitat depends 
on the associated migration probability. If the probability were one, then it would definitely 
be sent. When migration occurs, depending on the probabilities associated with the Habitat 
connections, an exact copy of the Agent is made at a connected Habitat. The copy of the 
Agent is identical until the new Agent's migration history is updated, which differentiates it 
from the original. The successful use of the migrated Agent, in response to user requests for 
applications, will lead to further migration (distribution) and therefore availability of the Agent 
to other users. 

The connections joining the Habitats are reinforced by successful Agent and Agent-sequence 
migration. The success of the migration, the migration feedback, leads to the reinforcing and 
creation of migration links between the Habitats, just as the failure of migration leads to the 
weakening and negating of migration links between the Habitats. The success of migration is 
determined by the usage of Agents at the Habitats to which they migrate. When an Agent- 
sequence is found and used in responding to a user request, then the individual Agent migration 
histories can be used to determine where they have come from and update the appropriate 
connection probabilities. If the Agent-sequence was fully or partly evolved elsewhere, then 
where the sequence or sub-sequences were created needs to be passed on to the connection 
probabilities, because the value in an Agent- sequence is the unique ordering and combination it 
provides of the individual Agents contained within. So, it is necessary to manage the feedback 
to the connection probabilities for migrating Agent-sequences, and not just the individual 
Agents contained within the sequence, including the partial use of an Agent-sequence in a newly 
evolved one. Specifically, the mechanism for migration feedback needs to know the Habitats 
where migrating Agent-sequences were created, to create new connections or reinforce existing 
connections to these Habitats. The global effect of the Agent migration and migration feedback 
on the Habitat network is the clustering of Habitats around the communities present within 
the user base, and will be discussed later in more detail. 

The escape range is the number of escape migrations available to an Agent upon the risk of 
death (deletion). If an Agent migrates to a Habitat and is not used after several user requests, 
then it will have the opportunity to migrate (move not copy) randomly to another connected 
Habitat. After this happens several times the Agent will be deleted (die). The escape range 
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will be dynamically responsive to the size of the Habitat cluster that the Agent exists within. 
This creates a dynamic time-to-live [58] for the Agents, in which Agents that are used more 
will live longer and distribute farther than those that are used less. 



2.2.3 Populations 




Figure 2.14: User Request to the Digital Ecosystem (modified from [162]): A user will formulate 
queries to the Digital Ecosystem by creating a request as a semantic description, like those being 
used and developed in Service- Oriented Architectures [254], specifying an application they desire 
and submitting it to their Habitat. A Population is then instantiated in the user's Habitat in 
response to the user's request, seeded from the Agents available at their Habitat (Agent-pool). 

The Populations of the Digital Ecosystem are functionally equivalent to the evolving, self- 
organising populations of a biological ecosystem, and are achieved through using evolutionary 
computing. A population in biological ecosystems is all the members of a species that occupy 
a particular area at a given time [168]. Our Population is also all the members of a species that 
occupy a particular area at a given time, like an island from the island-models of distributed 
evolutionary computing [178]. The use of distributed evolutionary computing to accelerate the 
Populations will be explained later. 

The users will formulate queries to the Digital Ecosystem by creating a request as a semantic 
description, like those being used and developed in Service-Oriented Architectures [254], 
specifying an application they desire and submitting it to their Habitat. This description 
enables the definition of a metric for evaluating the fitness of a composition of Agents, as a 
distance function between the semantic description of the request and the Agents' semantic 
descriptions. A Population is then instantiated in the user's Habitat in response to the user's 
request, seeded from the Agents available at their Habitat (i.e. its Agent-pool). This allows 
the evolutionary optimisation to be accelerated in the following three ways: first, the Habitat 
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network provides a subset of the Agents available globally, which is localised to the specific 
user it represents; second, making use of Agent-sequences previously evolved in response to 
the user's earlier requests; and third, taking advantage of relevant Agent-sequences evolved 
elsewhere in response to similar requests by other users. The Population then proceeds to 
evolve the optimal Agent-sequence (s) that fulfils the user request, and as the Agents are the 
base unit for evolution, it searches the available Agent-sequence combination space. For an 
evolved Agent-sequence that is executed (instantiated) by the user, it then migrates to other 
peers (Habitats) becoming hosted where it is useful, to combine with other Agents in other 
Populations to assist in responding to other user requests for applications. 



2.2.3.1 Evolution 



Evolution in biological ecosystems leads to both great diversity and high specialisation of its 
organisms [29]. In Digital Ecosystems the diversity of evolution will provide for the wide 
range of user needs and allow for quick responses to the changing of these user needs, while 
the specialisation will simultaneously provide solutions which are tailored to fulfil specific 
user requests. We will consider the issue of diversity in a later subsection, because it is 
achieved through evolution in a distributed environment, which will be discussed later. In 
biological ecosystems, evolutionary specialisation is localised to a population within its micro- 
habitat, which allows for the creation of niches (high specialisation) [168]. So, a Population is 
instantiated in the user's own Habitat, where the collection of Agents is chosen for the user, 
and the micro-Habitat is provided by the user request. There is nothing to preclude more 
than one Population being instantiated in a user's Habitat at any one time, provided there are 
computational resources sufficiently available. 

A selection pressure is the sum aggregate of the forces acting upon a population, resulting in 
genetic change through natural selection [168]. Those organisms best fit to survive the selection 
pressures operating upon them will pass on their biological fitness to their progeny through 
the inheritance process [168]. The fitness of an individual Agent-sequence within a Population 
is determined by a selection pressure, applied as a fitness function [90] instantiated from the 
user request, and works primarily on comparing the semantic descriptions of the Agents with 
the semantic description of the user request. The pressure selects for those Agent-sequences 
that are fit and capable of surviving the environment to reproduce, and against those that do 
not have sufficient fitness and therefore die before passing on their genes, thereby providing 
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the direction for genetic change. In biology fitness is a measure of an organism's success in its 
environment [168], and its definition here will be further explained in the next subsection. 

Genes are the functional unit in biological evolution [168]; whereas here the functional unit 
is the Agent. Therefore, the evolutionary process of a Population provides a combinatorial 
optimisation [240] of the Agents available, when responding to a user request. So, it does not 
change or mutate the Agents themselves. In biology a mutation is a permanent transmissible 
change (over the generations) in the genetic material (DNA) of an individual, and recombination 
(e.g. crossover) is the formation within the offspring of alleles (gene combinations), which are 
not present in the parents [168]. As in genetic algorithms [110], mutations will occur by 
switching Agents in and out of the Agent-sequence structure, and recombination (crossover) 
will occur by performing a crossing of two Agent-sequences. 

As the Digital Ecosystem receives more and more sophisticated requests, so more and more 
complex applications are evolved and become available for use by the users. To achieve this 
evolution, specifically the Agent-sequence recombination and optimisation, is a very significant 
challenge, because of the range of services that must be catered for and the potentially huge 
number of factors that must be considered for creating an applicable fitness function. First, 
to construct ever more complex software solutions, requires modularity, which is provided by 
the paradigm of service interoperability from Service- Oriented Architectures [228]. Second, 
two of the most important issues are that of defining fitness and managing bloat, which we will 
discuss next. Finally, there is a huge body of work and continuing research regarding theoretical 
approaches to evolutionary computing [90] , including the extensive use of genetic algorithms for 
practical real- world problem solving [85]. In defining Digital Ecosystems we should make use of 
the current state-of-the-art, and future developments, in the areas of evolutionary computing 
[144] and service interoperability [228]. 

2.2.3.2 Fitness 

In biology fitness is a measure of how successful an organism is in its environment, i.e. its 
phenotype [168]. The fitness of an Agent-sequence within a Population would also, ideally, 
be based upon its phenotype, the run-time instantiation, and nothing else. However, such 
an approach would be impractical, because it is currently infeasible to execute all the Agent- 
sequences of a Population at every generation, and not least because of the computational 
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resources that would be required. The other concern is one of practicality, by which we mean 
that it may not even be possible to perform a live execution for the executable components of an 
Agent-sequence; for example, if they are for buying an item from an online retailer. These are 
well known issues in evolutionary computing, which is why fitness functions are often defined 
as simulated input/output pairs to test functionality [191]. In Digital Ecosystems we can use 
historical usage information, but this would be insufficient initially, because such information 
would not be available at the time of an Agent's deployment. However, because each Agent also 
carries a semantic description, a specification of what it does, the fitness function can measure a 
complete Agent-sequence's collective semantic descriptions relative to the semantic description 
of a user request. So, initially the fitness function should be based primarily on comparing the 
semantic descriptions of the Agent-sequences to the semantic description of the user request, 
ever increasingly augmented with the growing usage information available for the Agents. In 
biological terms the genotype will be used as the phenotype, combined with any available past 
fitness of the phenotype; with the Agent's semantic description (genotype) therefore acting as a 
guarantee of its expected behaviour. So, for any newly deployed Agent a one-to-one genotype- 
phenotype mapping [281] will initially exist, until sufficient usage information is available. While 
the use of such a mapping is undesirable, it is temporary, and necessary to allow Digital 
Ecosystems to operate effectively. 

We have already suggested that the primary driver of the evolutionary process should initially 
be the extent by which an Agent-sequence can verifiably satisfy the specified requirements. This 
could be measured probabilistically, or using theorem-proving to validate the system, though 
automatic theorem proving is notoriously slow [289, 277]. However, there will also be other 
pressures on the fitness. For example, one may seek the most parsimonious solution to a problem 
(one that provides exactly the specified features and no more), or the cheapest solution, or one 
with a good reputation. Some aspects of fitness will be implicit in the evolutionary process (e.g. 
Agents which are often used will gain more fitness) while others will require explicit measures 
(e.g. price, or user satisfaction). One way to handle this multiplicity of fitness values (some 
qualitative) is to explicitly recognise the multi-objective nature of the optimisation problem. In 
this way, we are seeking not the single best solution, but a range of possible compromises that 
can be made most optimally. The set of solutions for which there are no better compromises is 
called the Pareto-set, and evolutionary techniques have been adapted to solve such problems 
with considerable success [318]. The main point is that selection has to be driven not by an 
absolute value of fitness, but rather by a notion of what it means for one solution to be better 
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than another. We say one solution dominates another if it is better in at least one respect, and 
no worse in any of the others [98]. 



2.2.3.3 Bloat 



If the repetition of Agents is allowed within evolving Agent-sequences, then the search space 
can become countably infinite, because the nature of the problem to be solved may not allow us 
to determine what the length of a solution is beforehand. Therefore, a variable length approach 
must be adopted, which is common in genetic programming [156]. When variable length 
representations of solutions are used, a well-known phenomenon arises, called bloat, in which 
the individuals of an evolving population tend to grow in size without gaining any additional 
advantage [166]. The bloat phenomenon can cause early termination of an evolutionary process 
due to the exhaustion of the available memory, and can also significantly reduce performance, 
because typically longer sequences have higher fitness computation costs [259]. Bloat is not 
specific to genetic programming, and is inherent in search techniques with discrete variable 
length representations [164]. It is a fundamental area of research within search-based approaches 
such as genetic algorithms, genetic programming and other approaches not based on populations 
such as simulated annealing [164]. However, considerable work on bloat has been done in 
connection with genetic programming [165, 22], and we believe that the genetic algorithms 
community generally, and the genetic-algorithms-based approach of our Digital Ecosystems 
specifically, can benefit directly from this research. While bloat is a phenomenon which was 
first observed in practice [156], theoretical analyses have been attempted [23]. One should take 
care with these approaches as implementations will always deal with finite populations, while 
theoretical approaches often deal with infinite populations [156], and this difference can be 
important. Yet, both theoretical and empirical approaches are required to understand bloat. 
There are many factors contributing to bloat, and while the phenomenon may appear simple, 
the reasons are not. There are several theories to explain why this occurs, and, as we shall 
discuss, some measures that can be taken for its prevention. 

There are several different qualitative theories which attempt to explain bloat, and they can 
be considered in two groups. First, protection against crossover and bias removal (which can 
be considered jointly) and second, the nature of programme search spaces [23]. First, near 
the end of a run a Population consists of mostly fit individuals, and any crossover is likely 
to be detrimental to the fitness of the offspring. In any sequence of Agents there may be 
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Agents that do not contribute semantically to the complete functionality of the sequence if, 
for example, their functionality was not requested by the user or if it is duplicated in the 
sequence; analogously to genetic programming [23], we can call these redundant Agents bloat. 
The genotype can then be grown further without affecting the phenotype if Agents with similar 
functionality are added; but, as the genotype grows larger, crossover is more likely to transfer 
redundant Agents to the new off-springs (assuming uniform crossover). Second, above a certain 
threshold size, the distribution of functionality does not vary with the size of the search space 
[23]. Thus, if we randomly sample long and short Agent-sequences above a length threshold, 
they will likely have the same functionality and fitness. So, as a search process progresses we 
are more likely to sample longer Agent-sequences, as mutation results in more of them (all 
other things being equal) and this will give rise to the bloating phenomenon. 

Each of the stages of construction of a genetic algorithm (i.e. choice of fitness function, selection 
method and genetic operator) can affect bloat. It has been shown that even small differences 
in the fitness function can cause a difference: a single programme glitch in an otherwise flat 
fitness landscape (from the neutral theory of molecular evolution [149]) is sufficient to drive 
the average programme size of an infinite population [204]. If a fitness-proportional selection 
method is used, individuals with zero fitness will be discontinued as they have zero probability 
of being selected as parents [36] . However, if tournament selection method is used, then there 
is a finite chance that individuals with zero fitness will be selected to be parents [36]. Finally, 
the choice of genetic operator affects the size of the programmes which are sampled; standard 
crossover on a flat landscape heavily oversamples the shorter programmes [251]. There are 
other factors that may affect bloat, for example, how the population is initialised, or the choice 
of representation used, such as a neutral genotype-phenotype mapping, which can actually 
alleviate bloat [208]. 

Bloat is a fact, whatever the reasons, happening in this type of optimisation and needs to be 
controlled if the space is to be searched effectively. One solution is to apply a hard limit to the 
size of the sets that can be sampled [166]: this enables the search algorithm to keep running 
without having out-of-memory run-time errors, but poses questions on how to set this hard 
limit. An alternative but similar method is to apply a parsimony pressure, where a term is 
added to the fitness function which chastises big sets in preference for smaller sets [296]. In 
this approach, individuals larger than the average size are evaluated with a reduced probability, 
biasing the search to smaller sets, while providing a dynamic limit which adapts to the average 
size of individuals in a changing population [296]. 
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2.2.4 The Digital Ecosystem 

The Digital Ecosystem supports the automatic combining of numerous Agents (which represent 
services), by their interaction in evolving Populations to meet user requests for applications, in 
a scalable architecture of distributed interconnected Habitats. The sharing of Agents between 
Habitats ensures the system is scalable, while maintaining a high evolutionary specialisation for 
each user. The network of interconnected Habitats is equivalent to the abiotic environment of 
biological ecosystems [29]; combined with the Agents, the Populations, the Agent migration for 
distributed evolutionary computing, and the environmental selection pressures provided by the 
user base, then the union of the Habitats creates the Digital Ecosystem, which is summarised 
in Figure 2.15. The continuous and varying user requests for applications provide a dynamic 
evolutionary pressure on the Agent sequences, which have to evolve to better fulfil those user 
requests, and without which there would be no driving force to the evolutionary self-organisation 
of the Digital Ecosystem. 

In the Digital Ecosystem, local and global optimisations concurrently operate to determine 
solutions to satisfy different optimisation problems. The global optimisation here is not a 
decentralised super-peer based control mechanism [263], but the completely distributed peer- 
to-peer network of the interconnected Habitats, which are therefore not susceptible to the failure 
of super-peers. It provides a novel optimisation technique inspired by biological ecosystems, 
working at two levels: a first optimisation, migration of Agents which are distributed in a 
peer-to-peer network, operating continuously in time; this process feeds a second optimisation, 
based on evolutionary combinatorial optimisation, operating locally on single peers and is aimed 
at finding solutions that satisfy locally relevant constraints. So, the local search is improved 
through this twofold process to yield better local optima faster, as the distributed optimisation 
provides prior sampling of the search space through computations already performed in other 
peers with similar constraints. This novel form of distributed evolutionary computing will be 
discussed further below, once we have discussed a topology resulting from an example user 
base. 

If we consider an example user base for the Digital Ecosystem, the use of Service-Oriented 
Architectures in its definition means that business-to-business (B2B) interaction scenarios [158] 
lend themselves to being a potential user base for Digital Ecosystems. So, we can consider 
the business ecosystem of Small and Medium sized Enterprise (SME) networks from Digital 
Business Ecosystems [222] , as a specific class of examples for B2B interaction scenarios; and in 
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Figure 2.15: Digital Ecosystem: A network of interconnected Habitats, combined with the 
Agents, the Populations, the Agent migration for distributed evolutionary computing, and the 
environmental selection pressures provided by the user base, then the union of the Habitats 
creates the Digital Ecosystem. Agents travel along the peer-to-peer connections; in every node 
(Habitat) local optimisation is performed through an evolutionary algorithm, where the search 
space is determined by the Agents present at the node. 

which the SME users are requesting and providing software services, represented as Agents 
in the Digital Ecosystem, to fulfil the needs of their business processes. Service-Oriented 
Architectures promise to provide potentially huge numbers of services that programmers 
can combine, via the standardised interfaces, to create increasingly more sophisticated and 
distributed applications [241]. The Digital Ecosystem extends this concept with the automatic 
combining of available and applicable services, represented by Agents, in a scalable architecture, 
to meet user requests for applications. These Agents will recombine and evolve over time, 
constantly seeking to improve their effectiveness for the user base. From the SME users' point 
of view the Digital Ecosystem provides a network infrastructure where connected enterprises 
can advertise and search for services (real- world or software only), putting a particular emphasis 
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on the composability of loosely coupled services and their optimisation to local and regional, 
needs and conditions. To support these SME users the Digital Ecosystem is satisfying the 
companies' business requirements by finding the most suitable services or combination of 
services (applications) available in the network. A composition of services is an Agent-sequence 
in the Habitat network that can move from one peer (company) to another, being hosted only 
in those where it is most useful in satisfying the SME users' business needs. 



2.2.4.1 Topology 



The Digital Ecosystem allows for the connectivity in the Habitats to adapt to the connectivity 
within the user base, with a cluster of Habitats representing a community within the user 
base. If a user is a member of more than one community, the user's Habitat will be in 
more than one cluster. This leads to a network topology that will be discovered with time, 
and which reflects the connectivity within the user base. Similarities in requests by different 
users will reinforce behavioural patterns, and lead to clustering of the Habitats within the 
ecosystem, which can occur over geography, language, etc. This will form communities for 
more effective information sharing, the creation of niches, and will improve the responsiveness 
of the system. The connections between the Habitats will be self-managed, through the 
mechanism of Agent migration defined earlier. Essentially, successful Agent migration will 
reinforce Habitat connections, thereby increasing the probability of future Agent migration 
along these connections. If a successful multi-hop migration occurs, then a new link between 
the start and end Habitats can be formed. Unsuccessful migrations will lead to connections 
(migration probabilities) decreasing, until finally the connection is closed. 

If we consider the business ecosystem - a network of Small and Medium sized Enterprises 
from Digital Business Ecosystems [222] - as an example user base, such business networks 
are typically small-world networks [333, 338]. They have many strongly connected clusters 
(communities), called sub-networks (quasi-complete graphs), with a few connections between 
these clusters (communities) [327]. Graphs with this topology have a very high clustering 
coefficient and small characteristic path lengths [327]. As the connections between Habitats are 
reconfigured depending on the connectivity of the user base, the Habitat clustering will therefore 
be parallel to the business sector communities, as shown in Figure 2.16. The communities will 
cluster over language, nationality, geography, etc. - all depending on the user base. So, the 
Digital Ecosystem will take on a topology similar to that of the user base. 
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Figure 2.16: Digital Business Ecosystem: Business ecosystem, network of Small and Medium 
sized Enterprises [222], using the Digital Ecosystem. As the connections between Habitats are 
reconfigured depending on the connectivity of the user base, the Habitat clustering will therefore 
be parallel to the business sector communities. 

Fragmentation of the Habitat network can occur, but only if dictated by the structure of the 
user base. The issue of greater concern is when individual Habitats become totally disconnected, 
which can only occur under certain conditions. One condition is that the Agents within the 
Agent-pool consistently fail to satisfy user requests. Another condition is when the Agents and 
Agent-sequences they share are undesirable to the users that are within the migration range 
of these Agents and Agent-sequences. These scenarios can arise because the Habitat is located 
within the wrong cluster, in which case the user can be asked to join another cluster within the 
Habitat network, assuming the user base is of sufficient size to provide a viable alternative. 




Figure 2.17: Habitat Clustering: Topology adapted to the small-world network of a business 
ecosystem of SMEs from Digital Business Ecosystems [222], having many strongly connected 
clusters (communities), called sub-networks (quasi- complete graphs), with a few connections 
between these clusters (communities) [327]. Graphs with this topology have a very high clustering 
coefficient and small characteristic path lengths [327]. 
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2.2.4.2 Distributed Evolution 



The Digital Ecosystem is a hybrid of Multi-Agent Systems, more specifically of mobile agent 
systems, Service- Oriented Architectures, and distributed evolutionary computing, which leads 
to a novel form of evolutionary computation. The novelty comes from the creation of multiple 
evolving Populations in response to similar requests, whereas in the island-models of distributed 
evolutionary computing there are multiple evolving populations in response to only one request 
[178]. So, in our Digital Ecosystem different requests are evaluated on separate islands 
(Populations), with their evolution accelerated by the sharing of solutions between the evolving 
Populations (islands), because they are working to solve similar requests (problems). This 
is shown in Figure 2.18, where the dashed yellow lines connecting the evolving Populations 
indicate similarity in the requests being managed. 
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Figure 2.18: Distributed Evolution in the Digital Ecosystem: Different requests are evaluated 
on separate islands (Populations), with their evolution accelerated by the sharing of solutions 
between the evolving Populations (islands), because they are working to solve similar requests 
(problems). The yellow lines connecting the evolving Populations indicate similarity in the 
requests being managed. 



If we again consider the business ecosystem of Small and Medium sized Enterprises from Digital 
Business Ecosystems [222] as an example user base, then in Figure 2.18 the four Habitats, in 
the left cluster, could be travel agencies, and the three with linked evolving Populations are 
looking for similar package holidays. So, an optimal solution found and used in one Habitat 
will be migrated to the other connected Habitats and integrated into any evolving Populations 
via the local Agent-pools. This will help to optimise the search for similar package holidays at 
the Habitats of the other travel agencies. This also works in a time-shifted manner, because 
an optimal solution is stored in the Agent-pool of the Habitats to which it is migrated, being 
available to optimise a similar request placed later. 
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The distributed architecture of Digital Ecosystems favours the use of Pareto-sets for fitness 
determination, because Pareto optimisation for multi-objective problems is usually most 
effective with spatial distribution of the populations, as partial solutions (solutions to different 
niches) evolve in different parts of a distributed population [75] (i.e. different Populations in 
different Habitats). By contrast, in a single population, individuals are always interacting with 
each other, via crossover, which does not allow for this type of specialisation [13]. 

This approach requires the Digital Ecosystem to have a sufficiently large user base, so that 
there can be communities within the user base, and therefore allow for similarity in the user 
requests. Assuming a user base of hundreds of users, then there would be hundreds of Habitats, 
in which there will be potentially three or more times the number of Populations at any one 
time. Then there will be thousands of Agents and Agent-sequences (applications) available to 
meet the requests for applications from the users. In such a scenario, there would be a sufficient 
number of users for the Digital Ecosystem to find similarity within their requests, and therefore 
apply our novel form of distributed evolutionary computing. 



2.2.4.3 Agent Life-Cycle 



An Agent is created to represent a user's service in the Digital Ecosystem, and its life-cycle 
begins with deployment to its owner's Habitat for distribution within the Habitat network. The 
Agent is then migrated to any Habitats connected to the owner's Habitat, to make it available 
in other Habitats where it could potentially be useful. The Agent is then available to the local 
evolutionary optimisation, to be used in evolving the optimal Agent-sequence in response to a 
user request. The optimal Agent-sequence is then registered at the Habitat, being stored in 
the Habitat's Agent-pool. If an Agent-sequence solution is then executed, an attempt is made 
to migrate (copy) it to every other connected Habitat, success depending on the probability 
associated with the connection. The Agent life-cycle is shown in Figure 2.19. 

An Agent can also be deleted if after several successive user requests at a Habitat it remains 
unused; it will have a small number of escape migrations, in which it is not copied, but is 
randomly moved to another connected Habitat. If the Agent fails to find a niche before running 
out of escape migrations, then it will be deleted. 
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Figure 2.19: Agent Life-Cycle: Begins with deployment to its owner's Habitat for distribution 
within the Habitat network. It can then be used in evolving the optimal Agent-sequence in 
response to a user request. The optimal Agent-sequence is then registered at the Habitat. If 
an Agent-sequence solution is then executed, an attempt is made to migrate (copy) it to every 
other connected Habitat, success depending on the probability associated with the connection. 



2.3 Simulation and Results 



We simulated the Digital Ecosystem, based upon our Ecosystem-Oriented Architecture, and 
recorded key variables to determine whether it displayed behaviour typical of biological 
ecosystems. Although agent-based modelling solutions, like Repast (Recursive Porous Agent 
Simulation Toolkit) [57] and MASON (Multi- Agent Simulator Of Neighbourhoods) [182], and 
evolutionary computing libraries, like ECJ (Evolutionary Computing in Java) [183] and the 
JCLEC (Java Computing Library for Evolutionary Computing) [319], are available, it was 
evident that it would take as much effort to adapt one, or a combination, of these to simulate 
the Digital Ecosystem, as it would to create our own simulation of the Digital Ecosystem, 
because the required ecological dynamics are largely absent from these and other available 
technologies. So, we created our own simulation, following the Ecosystem-Oriented Architecture 
from the previous section (unless otherwise specified), using the business ecosystem of Small 
and Medium sized Enterprises from Digital Business Ecosystems [222] as an example user base. 



2.3.1 Agents: Semantic Descriptions 



An Agent represents a user's service, including the semantic description of the business process 
involved, and is based on existing and emerging technologies for semantically capable Service- 
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A = {(1,25), (2,35), (3,55), (4,6), (5,37), (6,12)} 

Figure 2.20: Agent Semantic Descriptions: Each simulated Agent had a semantic description 
with an abstract representation consisting of a set of between three and six numeric tuples; 
each tuple representing an attribute of the semantic description, one integer for the attribute 
identifier and one for the attribute value, with both ranging between one and a hundred. 

Oriented Architectures [254], such as the OWL-S semantic markup for web services [197]. We 
simulated a service's semantic description with an abstract representation consisting of a set of 
numeric tuples, to simulate the properties of a semantic description. Each tuple representing 
an attribute of the semantic description, one integer for the attribute identifier and one for the 
attribute value, with both ranging between one and a hundred. Each simulated Agent had a 
semantic description, with between three and six tuples, as shown in Figure 2.20. 



2.3.2 User Base 



R = [{(1,23),(2,45),(3,33),(4,6),(5,8),(6,16)}, {(1,84), (2,48), (3,53), (4,11),(5,16)}] 

Figure 2.21: User Request: A simulated user request consisted of an abstract semantic 
description, as a list of sets of numeric tuples to represent the properties of a desired business 
application; each tuple representing an attribute of the semantic description, one integer for 
the attribute identifier and one for the attribute value, with both ranging between one and a 
hundred. 

Throughout the simulations we assumed a hundred users, which meant that at any time the 
number of users joining the network equalled those leaving. The Habitats of the users were 
randomly connected at the start, to simulate the users going online for the first time. The users 
then produced Agents (services) and requests for business applications. Initially, the users each 
deployed five Agents to their Habitats, for migration (distribution) to any Habitats connected 
to theirs (i.e. their community within the business ecosystem) . Users were simulated to deploy 
a new Agent after the submission of three requests for business applications, and were chosen 
at random to submit their requests. A simulated user request consisted of an abstract semantic 
description, as a list of sets of numeric tuples to represent the properties of a desired business 
application. The use of the numeric tuples made it comparable to the semantic descriptions 
of the services represented by the Agents; while the list of sets (two level hierarchy) and a 
much longer length provided sufficient complexity to support the sophistication of business 
applications. An example is shown in Figure 2.21. 
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The user requests were handled by the Habitats instantiating evolving Populations, which used 
evolutionary computing to find the optimal solution(s), Agent-sequence(s). It was assumed 
that the users made their requests for business applications accurately, and always used the 
response (Agent-sequence) provided. 



2.3.3 Populations: Evolution 

Populations of Agents, [A±, A±, A 2 , ...], were evolved to solve user requests, seeded with Agents 
and Agent-sequences from the Agent-pool of the Habitats in which they were instantiated. A 
dynamic population size was used to ensure exploration of the available combinatorial search 
space, which increased with the average length of the Population's Agent-sequences. The 
optimal combination of Agents (Agent-sequence) was evolved to the user request R, by an 
artificial selection pressure created by a fitness function generated from the user request R. 
An individual (Agent-sequence) of the Population consisted of a set of attributes, 01,02,..., 
and a user request essentially consisted of a set of required attributes, r\,r 2 , .... So, the fitness 
function for evaluating an individual Agent-sequence A, relative to a user request R, was 

fitness(A, R) = 1 , (2.1) 

where a is the member of A such that the difference to the required attribute r was minimised. 
Equation 2.1 was used to assign fitness values between 0.0 and 1.0 to each individual of the 
current generation of the population, directly affecting their ability to replicate into the next 
generation. The evolutionary computing process was encoded with a low mutation rate, a fixed 
selection pressure and a non-trapping fitness function (i.e. did not get trapped at local optima). 
The type of selection used fitness-proportional and non-elitist, fitness-proportional meaning that 
the fitter the individual the higher its probability of surviving to the next generation [36]. Non- 
elitist means that the best individual from one generation was not guaranteed to survive to the 
next generation; it had a high probability of surviving into the next generation, but it was not 
guaranteed as it might have been mutated [90]. Crossover (recombination) was then applied 
to a randomly chosen 10% of the surviving population, a one-point crossover, by aligning two 
parent individuals and picking a random point along their length, and at that point exchanging 
their tails to create two offspring [90]. Mutations were then applied to a randomly chosen 10% 
of the surviving population; point mutations were randomly located, consisting of insertions (an 
Agent was inserted into an Agent-sequence), replacements (an Agent was replaced in an Agent- 
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sequence), and deletions (an Agent was deleted from an Agent-sequence) [168]. The issue of 
bloat was controlled by augmenting the fitness function with a parsimony pressure [296] which 
biased the search to shorter Agent-sequences, evaluating longer than average length Agent- 
sequences with a reduced fitness, and thereby providing a dynamic control limit which adapted 
to the average length of the ever-changing evolving Agent Populations. 



2.3.4 Semantic Filter 

Agent's semantic description: 
{(1,25), (2,35), (3,55), (4,6), (5,37), (6,12)} 

(with semantic filter): 
{(Business, Airline), (Company, British Midland), (Quality, Economy), 
(Cost, 60), (Depart, Edinburgh), (Arrive, London)} 

user request: 

[{(1,23), (2,45), (3,33), (4,6), (5,8), (6,16)}, {(1,84), (2,48), (3,53), (4,11), 
(7,16), (8,34)}, {(1,23), (2,45), (3,53), (4,6), (5,16)(6,53)}, {(1,86), 
(2,48), (3,33), (4,25), (7,55)(8,23)}, {(1,25), (2,52), (3,53), (4,5), (5,55), 
(6,37)}, {(1,86), (2,48), (3,43), (4,25), (7,37), (8,40)}, {(1,22), (2,77), 
(3,82), (4,9), (5,35), (6,8)}] 

(with semantic filter): 
[{(Business, Airline), (Company, Air France), (Quality, Economy), 
(Cost, 60), (Depart, Edinburgh), (Arrive, Paris)}, {(Business, Hotel), 
(Company, Continental), (Quality, 3*), (Cost, 110), (Location, 
Paris), (Nights, 3)}, {(Business, Airline), (Company, Air France), 
(Quality, Economy), (Cost, 60), (Depart, Paris), (Arrive, Monte Carlo)}, 
{(Business, Hotel), (Company, Continental), (Quality, 2*), (Cost, 250), 
(Location, Monte Carlo), (Nights, 2)}, {(Business, Airline), (Company, 
KLM), (Quality, Economy), (Cost, 50), (Depart, Monte Carlo), (Arrive, 
London)}, {(Business, Hotel), (Company, Continental), (Quality, 3*), 
(Cost, 250), (Location, London), (Nights, 4)}, {(Business, Airline), 
(Company, Air Espana), (Quality, First), (Cost, 90), (Depart, London), 
(Arrive, Edinburgh)}] 

Figure 2.22: Semantic Filter: Shows the numerical semantic descriptions, of the simulated 
services (Agents) and user requests, in a human readable form. The semantic filter translates 
numerical semantic descriptions for one community within the user base, showing it in the 
context of the travel industry. The simulation still operated on the numerical representation for 
operational efficiency, but the semantic filter essentially assigns meaning to the numbers. 

The simulation of the Digital Ecosystem complies with the Ecosystem- Oriented Architecture 
defined in the previous section, but there was the possibility of model error in the business 
ecosystems of the user base (Small and Medium sized Enterprises from Digital Business 
Ecosystems [222]), because while the abstract numerical definition for the simulated semantic 
descriptions, of the services and requests the users provide, makes it widely applicable, it was 
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unclear that it could accurately represent business services. So we created a semantic filter to 
show the numerical semantic descriptions, of the simulated services (Agents) and user requests, 
in a human readable form. The basic properties of any business process are cost, quality, and 
time [68] ; so this was followed in the semantic filter. The semantic filter translates numerical 
semantic descriptions for one community within the user base, showing it in the context of 
the travel industry, as shown in Figure 2.22. The simulation still operated on the numerical 
representation for operational efficiency, but the semantic filter essentially assigns meaning to 
the numbers. The output from the semantic filter, in Figure 2.22, shows that the numerical 
semantic descriptions are a reasonable modelling assumption that abstracts sufficiently rich 
textual descriptions of business services. 



2.3.5 Evolutionary Dynamics 

100 I 1 




100 Generation 200 300 



Figure 2.23: Graph of Fitness in the Evolutionary Process: This shows both the maximum 
and average fitness increasing over the generations of a typical Population, and as expected 
the average fitness remains below the maximum fitness because of variation in the Population 
[110], showing that the evolutionary processes, which construct order in the Digital Ecosystem, 
are operating satisfactorily. 

We plotted the fitness of the evolutionary process for a typical Population, to ensure that 
the core process that creates order within the Digital Ecosystem was operating satisfactorily. 
The graph in Figure 2.23 shows both the maximum and average fitness increasing over the 
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generations of a typical Population, and as expected the average fitness remains below the 
maximum fitness because of variation in the Population [110], showing that the evolutionary 
processes, which construct order in the Digital Ecosystem, are operating satisfactorily. 

2.3.6 Ecological Succession 

We then compared some of the Digital Ecosystem's dynamics with those of biological 
ecosystems, to determine if it had been imbibed with the properties of biological ecosystems. 
A biological ecosystem develops from a simpler to a more mature state, by a process of 
succession, where the genetic variation of the populations changes with time [29]. So, it 
becomes increasingly more complex through this process of succession, driven by the evolution 
of the populations within the ecosystem [59]. Equivalently, the Digital Ecosystem's increasing 
complexity comes from the Agent Populations being evolved to meet the dynamic selection 
pressures created by the user requests. 




Figure 2.24: Ecological Succession (modified from [69]): The formation of a mature ecosystem 
is the slow, predictable, and orderly changes in the composition and structure of an ecological 
community, for which there are defined stages in the increasing complexity [29], as shown. So, 
it becomes increasingly more complex through this process of succession, driven by the evolution 
of the populations within the ecosystem [59]. 

The formation of a mature ecosystem, ecological succession, is the slow, predictable, and orderly 
changes in the composition and structure of an ecological community, for which there are defined 



2.3. Simulation and Results 



73 



stages in the increasing complexity [29], as shown in Figure 2.24. Succession may be initiated 
either by the formation of a new, unoccupied habitat (e.g., a lava flow or a severe landslide) 
or by some form of disturbance (e.g. fire, logging) of an existing community. The former case 
is often called primary succession, and the latter secondary succession [29]. The trajectory of 
ecological change can be influenced by site conditions, by the interactions of the species present, 
and by more stochastic factors such as availability of colonists or seeds, or weather conditions 
at the time of disturbance. Some of these factors contribute to predictability of successional 
dynamics; others add more probabilistic elements [112]. Trends in ecosystem and community 
properties of succession have been suggested, but few appear to be general. For example, species 
diversity almost necessarily increases during early succession upon the arrival of new species, 
but may decline in later succession as competition eliminates opportunistic species and leads 
to dominance by locally superior competitors [59]. Net Primary Productivity 3 , biomass, and 
trophic level properties all show variable patterns over succession, depending on the particular 
system and site [112]. Generally, communities in early succession will be dominated by fast- 
growing, well-dispersed species, but as the succession proceeds these species will tend to be 
replaced by more competitive species [29]. 

We then considered existing theories of complexity for ecological succession and how it would 
apply to Digital Ecosystems, seeking a high-level understanding that would apply equally to 
both biological and digital ecosystems. As succession leads communities, of an ecosystem, 
to states of dynamic equilibrium 4 within the environment [29], the complexity has to increase 
initially or there would be no ecosystem, and presumably this increase eventually stops, because 
there must be a limit to how many species can be supported. The period in between is more 
complicated. If we consider the neutral biodiversity theory [136], which basically states network 
aspects of ecosystems are negligible, we would probably get a relatively smooth progression, 
because although you would get occasional extinctions, they would be randomly isolated events 
whose frequency would eventually balance arrivals, not self-organised crashes like in systems 
theory. In systems theory [91], when a new species arrives in an ecological network, it can 
create a positive feedback loop that destabilises part of the network and drives some species 
to extinction. Ecosystems are constantly being perturbed, so it is reasonable to assume that a 
species that persists will probably be involved in a stabilising interaction with other species. So, 

3 Net Primary Productivity is the net flux of carbon from the atmosphere into green plants per unit time [168]. 

4 Dynamic Equilibrium is when opposing forces of a system are proceeding at the same rate, such that its state 
is unchanging with time [29]. 
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the whole ecological network evolves to resist invasion. That would lead to a spiky succession 
process, perhaps getting less spiky over time. 

So, which theory is more applicable to the Digital Ecosystem depends on the extent that 
a species in the ecosystem acts independently, competing entities (smooth succession) [136] 
versus tightly co-adapted ecological partners (spiky succession) [91]. Our Digital Ecosystem 
despite its relative complexity is quite simple compared to biological ecosystems. It has the 
essential and fundamental processes, but no sophisticated social mechanisms. Therefore, the 
smooth succession of the neutral biodiversity theory [136] is more probable. 

As the increasing complexity of the Digital Ecosystem comes from its evolving Agent 
Populations responding to user requests, the effectiveness of the evolved Agent-sequences 
(responses) can provide a measure of its complexity over time. So, in simulation we measured 
the effectiveness of its responses over a thousand user requests, i.e. until it had reached a mature 
state like a biological ecosystem [29], and graphed a typical run in Figure 2.25. The range and 
diversity of Agents at initial deployment were such that 70% fulfilment of user requests was 
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Figure 2.25: Graph of Succession in the Digital Ecosystem: The formation of a mature biological 
ecosystem, ecological succession, is a relatively slow process [29], and the simulated Digital 
Ecosystem acted similarly in reaching a mature state. Still, at the end of the simulation run, 
the Agent- sequences had evolved and migrated over an average of only ten user requests per 
Habitat, and collectively had already reached near 70% effectiveness for the user base. 
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possible, increasing to 100% fulfilment as more Agents were deployed. The Digital Ecosystem 
performed as expected, adapting and improving over time, reaching a mature state as seen 
in the graph of Figure 2.25. The succession of the Digital Ecosystem followed the smooth 
succession of the neutral biodiversity theory [136], shown by the tight distribution and equal 
density of the points around the best fit curve of the graph in Figure 2.25. The variation in the 
percentage responsiveness, over the successive user request events, came from the differential 
rates of adaption at the Habitats. Still, by the end of the simulation run, the Agent-sequences 
had evolved and migrated over an average of only ten user requests per Habitat, and collectively 
had already reached near 70% effectiveness for the user base. The formation of a mature 
biological ecosystem, ecological succession, is a relatively slow process [29], and the simulated 
Digital Ecosystem acted similarly in reaching a mature state. 



2.3.7 Species Abundance 



In ecology, relative abundance is a measure of the proportion of all organisms in a community 
belonging to a particular species [30] . A relative abundance distribution provides the inequalities 
in population size within an ecosystem and therefore an indicator of biodiversity, with the 
distribution of most biological ecosystems taking a log-normal form [30]. So, for Digital 
Ecosystems this measures globally the abundance of different solutions relative to one another. 

A snapshot of the Agents (organisms) within the Digital Ecosystem, for a typical simulation 
run, was taken after a thousand user requests, i.e. once it had reached a mature state. In 
biology a species is a series of populations within which significant gene flow can and does 
occur, so groups of organisms showing a very similar genetic makeup [168]. We therefore chose 
to define species within Digital Ecosystems similarly, as a grouping of genetically similar digital 
organisms (based on their semantic descriptions), with no more than 10% variation within the 
species group. Relative abundance was calculated for each species and grouped by frequency 
in Figure 2.26. In contrast to expectations from biological ecosystems, relative abundance in 
the Digital Ecosystem did not conform to the expected log-normal [30]. We suggest that the 
high frequency for the lowest relative abundance was caused by the dynamically re-configurable 
topology of the Habitat network, which allowed species of small abundance to survive as their 
respective Habitats were clustered by the Digital Ecosystem. Therefore, it also most likely 
skewed the other frequencies of the relative abundance measure. 
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Figure 2.26: Graph of Relative Abundance in the Digital Ecosystem: Relative abundance is 
a measure of the proportion of all organisms in a community belonging to a particular species 
[30]. A relative abundance distribution provides the inequalities in population size within an 
ecosystem and therefore an indicator of biodiversity, with the distribution of most biological 
ecosystems taking a log-normal form [30]. However, the Digital Ecosystem did not conform to 
the expected log-normal. 



2.3.8 Species- Area Relationship 

In ecology the species-area relationship measures diversity relative to the spatial scale, showing 
the number of species found in a defined area of a particular habitat or habitats of different 
areas [288], and is commonly found to follow a power law in biological ecosystems [288]. For 
Digital Ecosystems this relationship represents how similar solutions are to one another at 
different Habitat scales. 

Again, a snapshot of the Agents (organisms) within the Digital Ecosystem, for a typical 
simulation run, was taken once it had reached a mature state, after a thousand user requests. 
For this experiment, we assumed each Habitat to have an area of one unit. Then, the number 
of species, at n randomly chosen Habitats, was measured, where n ranged between one and 
a hundred. For each n, ten sets of measurements were taken at different random sets of 
Habitats to calculate averaged results, and the logya values of these results are depicted in the 
graph of Figure 2.27. The distribution of species diversity over a spatial scale in the Digital 
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Figure 2.27: Graph of Species-Area in the Digital Ecosystem: In ecology the species-area 
relationship measures diversity relative to the spatial scale, showing the number of species 
found in a defined area of a particular habitat or habitats of different areas [288], and is 
commonly found to follow a power law in biological ecosystems, which the Digital Ecosystem 
also demonstrates. 

Ecosystem demonstrates behaviour similar to biological ecosystems, also following a power law 
[288] . However, diversity at fine spatial scales appears to be lower than predicted by the line of 
best fit. This may be explained by higher specialisation at some Habitats, making them more 
like micro-habitats in terms of a reduced species diversity [168]. 



2.4 Summary and Discussion 



We started by reviewing existing digital ecosystems, and then introduced biomimicry in 
computing, Nature Inspired Computing, to create a definition that could be called the 
digital counterpart of biological ecosystems. Then, by comparing and contrasting the relevant 
theoretical ecology with the anticipated requirements of Digital Ecosystems, we examined 
how ecological features may emerge in some systems designed for adaptive problem solving. 
Specifically, we suggested that Digital Ecosystems, like a biological ecosystems, will consist of 
self-replicating agents that interact both with one another and with an external environment 
[29]. Population dynamics and evolution, spatial and network interactions, and complex 
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dynamic fitness landscapes will all influence the behaviour of these systems. Many of these 
properties can be understood via well-known ecological models [185, 136], with a further body 
of theory that treats ecosystems as Complex Adaptive Systems [173]. These models provide 
a theoretical basis for the occurrence of self-organisation, in digital and biological ecosystems, 
resulting from the interactions among the agents and their environment, leading to complex non- 
linear behaviour [185, 136, 173]; and it is this property that provides the underlying potential 
for scalable problem-solving in digital environments. Based on the theoretical ecology, we 
considered fields from the domain of computer science, relevant in the creation of Digital 
Ecosystems. As we required the digital counterparts for the behaviour and constructs of 
biological ecosystems, and not their simulation or emulation, we considered parallels using 
existing and developing technologies to provide their equivalents. This included elements from 
mobile agent systems [249] to provide a parallel to the agents of biological ecosystems and 
their migration to different habitats, and distributed evolutionary computing [178] and Service- 
Oriented Architectures [228] for the distribution and evolution of these migrating agents in 
evolving populations. 

Our efforts culminated in the definition of Ecosystem- Oriented Architectures for the creation 
of Digital Ecosystems, where the Digital Ecosystem supports the automatic combining of 
numerous Agents (which represent services), by their interaction in evolving Populations to meet 
user requests for applications, in a scalable architecture of distributed interconnected Habitats. 
Agents travel along the peer-to-peer connections; in every node (Habitat) local optimisation 
is performed through an evolutionary algorithm, where the search space is determined by the 
Agents present at the node. The sharing of Agents between Habitats ensures the system 
is scalable, while maintaining a high evolutionary specialisation for each user. The network 
of interconnected Habitats is equivalent to the physical environment of biological ecosystems 
[29] and - combined with the Agents, the Populations, the Agent migration for distributed 
evolutionary computing, and the environmental selection pressures provided by the user base - 
the union of the Habitats creates the Ecosystem-Oriented Architecture of a Digital Ecosystem. 
Continuous and varying user requests for applications provide a dynamic evolutionary pressure 
on the Agent-sequences, which have to evolve to better fulfil those requests, and without which 
there would be no driving force to the evolutionary self-organisation of the Digital Ecosystem. 
This represents a novel, cutting-edge approach to distributed evolutionary computing, because 
instead of having multiple populations sharing solutions to find the optimal solution for 
one problem, there are multiple populations to find optimal solutions for multiple similar 
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problems. The business ecosystem of Small and Medium sized Enterprises from Digital Business 
Ecosystems [222] was considered as an example user base, because of their adoption of the 
ecosystems paradigm, and because our use of Service-Oriented Architectures in defining Digital 
Ecosystems predisposes them to business-to-business interaction scenarios. We have also 
dealt with critical issues which would otherwise cripple our complex system and prevent it 
from providing a scalable solution, like bloat in evolutionary processes and points-of-failure in 
networking topologies. In essence, we are making a system greater than the sum of its parts, 
expected to show emergent and complex behaviour that cannot be predicted until it is created. 

In simulation, we compared the Digital Ecosystem's dynamics to those of biological ecosystems. 
The ecological succession, measured by the responsiveness to user requests, conformed to 
expectations from biological ecosystems [136]: improving over time, before approaching a 
plateau. As the evolutionary self-organisation of an ecosystem is a slow process [29], even 
the accelerated form present in Digital Ecosystems reached only 70% responsiveness, showing 
potential for improvement. In the species abundance experiment the Digital Ecosystem did 
not conform to the log- normal distribution usually found in biological ecosystems [30]. The 
high frequency for the lowest relative abundance was probably caused by the dynamically 
re-configurable topology of the Habitat network, which allowed species of small abundance 
to survive as their Habitats were clustered by the Digital Ecosystem. In the species-area 
experiment, which measures diversity relative to spatial scale, the Digital Ecosystem did follow 
the power law commonly found in biological ecosystems [288]. The species diversity at fine 
spatial scales was lower than predicted by the line of best fit, and may be explained by 
the high specialisation at some Habitats, making them more like micro-habitats, including 
a reduced species diversity [168]. The majority of the experimental results indicate that Digital 
Ecosystems behave like their biological counterparts, and suggest that incorporating ideas from 
theoretical ecology can contribute to useful self-organising properties in Digital Ecosystems, 
which can assist in generating scalable solutions to complex dynamic problems. 

Creating the digital counterpart of biological ecosystems was not without apparent compro- 
mises; the temporary one-to-one genotype-phenotype mapping for Agents, the information- 
centric dynamically re-configurable network topology, and the species abundance result are 
inconsistent with biological ecosystems. Initially, any newly deployed Agent has a one-to-one 
genotype-phenotype mapping [281], until sufficient usage (phenotype) information is amassed 
for use in fitness functions. While the use of such a mapping is undesirable, it is temporary, 
and necessary to allow the Digital Ecosystem to operate. The Digital Ecosystem requires a re- 
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Figure 2.28: Hypothetical Abstract Ecosystem Definition: If there were an abstract ecosystem 
class in the Unified Modelling Language, then the Digital Ecosystem and biological ecosystem 
classes would both inherit from the abstract ecosystem class, but implement its attributes 
differently. So, we would argue that the apparent compromises in mimicking biological 
ecosystems are actually features unique to Digital Ecosystems. 

configurable network topology, to support the constantly changing multi-objective information- 
centric selection pressures of the user base. Hence, using the concept of Hebbian learning 
[132], Habitat connectivity is dynamically adapted based on the observed migration paths 
of the Agents within the Habitat network. The dynamically re-configurable network topology 
probably caused the Digital Ecosystem not to conform, in the species abundance experiment, to 
the log- normal distribution expected from biological ecosystems [30] . We would argue that these 
differences are not compromises, but features unique to Digital Ecosystems. As we discussed 
earlier, biomimicry, when done well, is not slavish imitation; it is inspiration using the principles 
which nature has demonstrated to be successful design strategies [33]. Hypothetically, if there 
were an abstract definition of an ecosystem, defined as an abstract ecosystem class, then 
the Digital Ecosystem and biological ecosystem classes would both inherit from the abstract 
ecosystem class, but implement its attributes differently, as shown in the Unified Modelling 
Language class diagram of Figure 2.28. So, we would argue that the apparent compromises in 
mimicking biological ecosystems are actually features unique to Digital Ecosystems. 

Service-oriented architectures promise to provide potentially huge numbers of services that 
programmers can combine via standardised interfaces, to create increasingly sophisticated and 
distributed applications [241]. The Digital Ecosystem extends this concept with the automatic 
combining of available and applicable services in a scalable architecture to meet user requests 
for applications. This is made possible by a fundamental paradigm shift, from a pull-oriented 
approach to a push-oriented approach. So, instead of the pit/Z-oriented approach of generating 
applications only upon request in Service-Oriented Architectures [287], the Digital Ecosystem 
follows a push-oriented approach of distributing and composing applications pre-emptively, as 
well as upon request. Although the use of Service-Oriented Architectures in the definition of 
Digital Ecosystems provides a predisposition to business [158], it does not preclude other more 
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general uses. The Ecosystem-Oriented Architecture definition of Digital Ecosystems is intended 
to be inclusive and interoperable with other technologies, in the same way that the definition 
of Service-Oriented Architectures is with grid computing and other technologies [287]. For 
example, Habitats could be executed using a distributed processing arrangement, such as grid 
computing [287], which would be possible because the Habitat network topology is information- 
centric (instead of location-centric). 

In this chapter we have determined the fundamentals for a new class of system, Digital 
Ecosystems, created through combining understanding from theoretical ecology, evolutionary 
theory, Multi-Agent Systems, distributed evolutionary computing, and Service-Oriented 
Architectures. The word ecosystem is more than just a metaphor since it is the digital 
counterpart of biological ecosystems. Therefore, Digital Ecosystems have their desirable 
properties, such as scalability and self-organisation, and are complex systems that show 
emergent behaviour, since they are more than the sum of their constituent parts. Once we 
have further investigated its self-organising properties in the next chapter, Chapter 3, we will 
attempt its optimisation, for which the experimental results have shown there is potential, in 
the following chapter, Chapter 4. 
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Chapter 3 

Investigation of Digital Ecosystems 



In this chapter we investigate the self-organising behaviour of Digital Ecosystems, because a 
primary motivation for our research is to exploit the self-organising properties of biological 
ecosystems. Over time a biological ecosystem becomes increasingly self-organised through the 
process of ecological succession, driven by the evolutionary self-organisation of the populations 
within the ecosystem. Analogously, a Digital Ecosystem's increasing self-organisation comes 
from the Agent Populations within being evolved to meet the dynamic selection pressures 
created by the requests from the user base. We start by discussing the relevant literature 
on self-organisation, including the philosophical meaning of organisation and of self, before 
focusing on its application to evolving Agent Populations. The self-organisation of biological 
ecosystems is often defined in terms of the complexity, stability, and diversity. So, we studied 
further to extend a definition for the complexity, grounded in the biological sciences, called 
Physical Complexity; based on statistical physics, automata theory, and information theory, 
providing a measure of the quantity of information in an organism's genome, by calculating the 
entropy in a population to determine the randomness in the genome. Next, we investigate and 
extend a definition for the stability, originating from the computer sciences, called Chli-De Wilde 
stability, which views a Multi-Agent System as a discrete time Markov chain with potentially 
unknown transition probabilities. With a Multi-Agent System being considered stable when 
its state has converged to an equilibrium distribution. Finally, we investigate a definition for 
the diversity, relative to the selection pressures provided by the user requests, considering the 
collective self-organised diversity of the evolving Agent Populations relative to the global user 
request behaviour. We conclude with a summary and discussion of the achievements, including 
the experimental results. 
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3.1 Background Theory 

Self-organisation is perhaps one of the most desirable features in the systems that we design, 
and a primary motivation for our research in Digital Ecosystems is the desire to exploit 
the self-organising properties of biological ecosystems [173], which are thought to be robust, 
scalable architectures that can automatically solve complex, dynamic problems. Over time 
a biological ecosystem becomes increasingly self-organised through the process of ecological 
succession [29], driven by the evolutionary self-organisation of the populations within the 
ecosystem. Analogously, a Digital Ecosystem's increasing self-organisation comes from the 
Agent Populations within being evolved to meet the dynamic selection pressures created by 
the requests from the user base. The self-organisation of biological ecosystems is often defined 
in terms of the complexity, stability, and diversity [150] , which we will also apply to our Digital 
Ecosystems. 

It is important for us to be able to understand, model, and define self-organising behaviour, 
determining macroscopic variables to characterise this self-organising behaviour of the order 
constructing processes within, the evolving Agent Populations. However, existing definitions of 
self-organisation may not be directly applicable, because evolving Agent Populations possess 
properties of both computing systems (e.g. agent systems) as well as biological systems (e.g. 
population dynamics), and the combination of these properties makes them unique. So, to 
determine definitions for the self-organising complexity, stability, and diversity we will start by 
considering the available literature on self-organisation, for its general properties, its application 
to Multi- Agent Systems (the dominant technology in Digital Ecosystems), and its application 
to our evolving Agent Populations. 



3.1.1 Self- Organisation 

Self-organisation has been around since the late 1940s [11], but has escaped general formalisa- 
tion despite many attempts [231, 152]. There have instead been many notions and definitions of 
self-organisation, useful within their different contexts [133]. They have come from cybernetics 
[11, 28, 134], thermodynamics [231], mathematics [171], information theory [282], synergetics 
[125], and other domains [170]. The term self-organising is widely used, but there is no generally 
accepted meaning, as the abundance of definitions would suggest. Therefore, the philosophy of 
self-organisation is complicated, because organisation has different meanings to different people. 
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So, we would argue that any definition of self-organisation is context dependent, in the same 
way that a choice of statistical measure is dependent on the data being analysed. 

Proposing a definition for self-organisation faces the cybernetics problem of defining system, 
the cognitive problem of perspective, the philosophical problem of defining self, and the context 
dependent problem of defining organisation [107]. 

The system in this context is an evolving Agent Population, with the replication of individuals 
from one generation to the next, the recombination of the individuals, and a selection pressure 
providing a differential fitness between the individuals, which is behaviour common to any 
evolving population [29]. 

Perspective can be defined as the perception of the observer in perceiving the self-organisation 
of a system [12, 28], matching the intuitive definition of / will know it when I see it [283], 
which despite making formalisation difficult shows that organisation is perspective dependent 
(i.e. relative to the context in which it occurs). In the context of an evolutionary system, the 
observer does not exist in the traditional sense, but is the selection pressure imposed by the 
environment, which selects individuals of the population over others based on their observable 
fitness. Therefore, consistent with the theoretical biology [29], in an evolutionary system the 
self-organisation of its population is from the perspective of its environment. 

Whether a system is se//-organising or being organised depends on whether the process causing 
the organisation is an internal component of the system under consideration. This intuitively 
makes sense, and therefore requires one to define the boundaries of the system being considered 
to determine if the force causing the organisation is internal or external to the system. For an 
evolving population the force leading to its organisation is the selection pressure acting upon it 
[29], which is formed by the environment of the population's existence and competition between 
the individuals of the population [29]. As these are internal components of an evolving Agent 
Population [29], it is a self-organising system. 

Now that we have defined, for an evolving Agent Population, the system for which its 
organisation is context dependent, the perspective to which it is relative, and the self by which 
it is caused, a definition for its self- organisation can be considered. The context, an evolving 
Agent Population in its environment, lacks a 2D or 3D metric space, so it is necessary to consider 
a visualisation in a more abstract form. We will let a single square, □, represent an Agent, 
with colours to represent different Agents. Agent-sequences will therefore be represented by a 
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Figure 3.1: Visualisation of Self-Organisation in Evolving Agent Populations: The number 
of Agents, in total and of each colour, is the same in both populations. However, the Agent 
Population on the left intuitively shows organisation through the uniformity of the colours across 
the Agent- sequences, whereas the population to the right shows little or no organisation. 

sequence of coloured squares, I I I I . with a Population consisting of multiple Agent-sequences, 
as shown in Figure 3.1. 

In Figure 3.1 the number of Agents, in total and of each colour, is the same in both populations. 
However, the Agent Population on the left intuitively shows organisation through the uniformity 
of the colours across the Agent-sequences, whereas the population to the right shows little or 
no organisation. Following biological ecosystems, which defines self-organisation in terms of the 
complexity, stability, and diversity relative to the perspective of the selection pressure [150]: the 
self-organised complexity of the system is the creation of coherent patterns and structures from 
the Agents, the self-organised stability of the system is the resulting stability or instability that 
emerges over time in these coherent patterns and structures, and the self-organised diversity of 
the system is the optimal variability within these coherent patterns and structures. 



3.1.2 Definitions of Self- Organisation 

Many alternative definitions have been proposed for self-organisation within populations and 
agent systems, with each defining what property or properties demonstrate self-organisation. 
So, we will now consider the most applicable alternatives for their suitability in defining the 
self-organised complexity, stability, and diversity of an evolving Agent Population. 

One possibility would be the G-machine definition of evolving populations, which models the 
emergence of organisation in pre-biotic evolutionary systems [64]. An G-machine consists of 
a set of causal states and transitions between them, with symbols of an alphabet labelling 
the transitions and consisting of two parts: an input symbol that determines which transition 
to take from a state, and an output symbol which is emitted on taking that transition [64]. 
G-machines have several key properties [65]: all their recurrent states form a single, strongly 
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connected component, their transitions are deterministic in the specific sense that a causal 
state with the edge symbol-pair determines the successor state, and an G-machine is the 
smallest causal representation of the transformation it implements. The G-machine definition 
of self-organisation also identifies the forms of complexity, stability, and diversity [64], but with 
definitions focused on pre-biotic evolutionary systems, i.e. the primordial soup of chemical 
replicators from the origin of life [257]. Complexity is defined as a form of structural-complexity, 
measuring the state-machine-based information content of the G-machine individuals of a 
population [64]. Stability is defined as a meta-machine, a set (composition) of G-machines, 
that can be regarded as an autonomous and self-replicating entity [64]. Diversity is defined, 
using an interaction network, as the variability of interaction in a population [64]. So, while 
these definitions of self-organisation are compatible at the higher more abstract level, i.e. in 
the forms of self-organisation present, the deeper definitions of these forms are not applicable 
because they are context dependent. As we explained in the previous subsection, definitions of 
self-organisation are context dependent, and so the context of pre-biotic evolutionary systems, 
to which the G-machine self-organisation applies, is very different to the context of an evolving 
Agent Population from our Digital Ecosystem. Evolving Agent Populations are defined from 
Ecosystem- Oriented Architectures, which have evolutionarily surpassed the context of pre-biotic 
evolutionary systems, shown by the necessity of our consideration of the later evolutionary stage 
of ecological succession [29] in section 2.3.6. 

The Minimum Description Length principle [24] could be applied to the executable components 
or semantic descriptions of the Agent-sequences of a Population, with the best model, among 
a collection of tentatively suggested ones, being the one that provides the smallest stochastic 
complexity. However, the Minimum Description Length principle does not define how to select 
the family of model classes to be applied for determining the stochastic complexity [126] . This 
problem of model selection is well known and cannot be adequately formalised, and so in 
practise selection is based on human judgement and prior knowledge of the kinds of models 
previously chosen [126]. Therefore, while models could be chosen to represent the self-organised 
complexity, and possibly even the diversity, there is no procedural method for determining these 
models, because subjective human intervention is required for model selection on a case-by-case 
basis. 

The Priigel-Bennett Shapiro formalism models the evolutionary dynamics of a population of 
sequences, using techniques from statistical mechanics and focuses on replica symmetry [253]. 
The individual sequences are not considered directly, but in terms of the statistical properties 



88 



Chapter 3. Investigation of Digital Ecosystems 



of the population, using a macroscopic level of description with specific statistical properties 
to characterise the population, that are called macros copies. A macroscopic formulation of 
an evolving population reduces the huge number of degrees of freedom to the dynamics of 
a few quantities, because a non-linear system of a few degrees of freedom can be readily 
solved or numerically iterated [253]. However, since a macroscopic description disregards a 
significant amount of information, subjective human insight is essential so that the appropriate 
macroscopics are chosen [284]. So, while macroscopics could be chosen to represent the self- 
organised complexity, stability, and diversity, there is no procedural method for determining 
these macroscopics, because subjective human insight is required for macroscopic selection on 
a case-by-case basis. 

Kolmogorov-Chaitin complexity defines the complexity of binary sequences by the smallest 
possible Universal Turing Machine, algorithm (programme and input) that produces the 
sequence [177]. A sequence is said to be regular if the algorithm necessary to produce it 
on a Universal Turing Machine is shorter than the sequence itself [177]. A regular sequence 
is said to be compressible, whereas its compression, into the most succinct Universal Turing 
Machine possible, is said to be incompressible as it cannot be reduced any further in length 
[177]. A random sequence is said to be incompressible, because the Universal Turing Machine 
to represent it cannot be shorter than the random sequence itself [177]. This intuitively 
makes sense for algorithmic complexity, because algorithmically regular sequences require 
a shorter programme to produce them. So, when measuring a population of sequences, 
the Kolmogorov-Chaitin complexity would be the shortest Universal Turing Machine to 
produce the entire population of sequences. However, Chaitin himself has considered the 
application of Kolmogorov-Chaitin complexity to evolutionary systems, and realised that 
although Kolmogorov-Chaitin complexity represents a satisfactory definition of randomness 
in algorithmic information theory, it is not so useful in biology [48]. For evolving Agent 
Populations the problem manifests itself most significantly when the Agents are randomly 
distributed within the Agent-sequences of the Population, having maximum Kolmogorov- 
Chaitin complexity, instead of the complexity it ought to have of zero. This property makes 
Kolmogorov-Chaitin complexity unsuitable as a definition for the self-organised complexity of 
an evolving Agent Population. 

A definition called Physical Complexity can be estimated for a population of sequences, 
calculated from the difference between the maximal entropy of the population, and the actual 
entropy of the population when in its environment [8]. This Physical Complexity, based 
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on Shannon's entropy of information, measures the information in the population about its 
environment, and therefore is conditional on its environment. It can be estimated by counting 
the number of loci that are fixed for the sequences of a population [5]. Physical Complexity 
would therefore be suitable as a definition of the self-organised complexity. However, a possible 
limitation is that Physical Complexity is currently only formulated for populations of sequences 
with the same length. 

Self-Organised Criticality in evolution is defined as a punctuated equilibrium in which the 
population's critical state occurs when the fitness of the individuals is uniform, and for which 
an avalanche, caused by the appearance and spread of advantageous mutations within the 
population, temporarily disrupts the uniformity of individual fitness across the population [17]. 
Whether an evolutionary process displays Self- Organised Criticality remains unclear. There 
are those who claim that Self- Organised Criticality is demonstrated by the available fossil 
data [292] , with a power law distribution on the lifetimes of genera drawn from fossil records, 
and by artificial life simulations [4], again with a power law distribution on the lifetimes of 
competing species. However, there are those who feel that the fossil data is inconclusive, 
and that the artificial life simulations do not show Self-Organised Criticality, because the key 
power law behaviour in both can be generated by models without Self-Organised Criticality 
[229]. Also, the Self-Organised Criticality does not define the resulting self-organised stability 
of the population, only the organisation of the events (avalanches) that occur in the population 
over time. 

Evolutionary Game Theory [330] is the application of models inspired from population genetics 
to the area of game theory, which differs from classical game theory [103] by focusing on the 
dynamics of strategy change more than the properties of individual strategies. In Evolutionary 
Game Theory, agents of a population play a game, but instead of optimising over strategic 
alternatives, they inherit a fixed strategy and then replicate depending on the strategy's payoff 
(fitness) [330]. The self-organisation found in Evolutionary Game Theory is the presence of 
stable steady states, in which the genotype frequencies of the population cease to change over 
the generations. This equilibrium is reached when all the strategies have the same expected 
payoff, and is called a stable steady state, because a slight perturbing will not cause a move 
far from the state. An evolutionary stable strategy leads to a stronger asymptotically stable 
state, as a slight perturbing causes only a temporary move away from the state before returning 
[330] . So, Evolutionary Game Theory is focused on genetic stability between competing between 
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individuals, rather than the stability of the population as a whole, which therefore limits its 
suitability for the self-organised stability of an evolving Agent Population. 

Multi-Agent Systems are the dominant computational technology in the evolving Agent 
Populations, and while there are several definitions of self-organisation [245, 188, 308, 81] and 
stability [216, 331, 238] defined for Multi-Agent Systems, they are not applicable primarily 
because of the evolutionary dynamics inherent in the context of evolving Agent Populations. 
Whereas Chli-DeWilde stability of Multi-Agent Systems [53] may be suitable, because it 
models Multi-Agent Systems as Markov chains, which are an established modelling approach 
in evolutionary computing [272]. A Multi- Agent System is viewed as a discrete time Markov 
chain with potentially unknown transition probabilities, in which the agents are modelled as 
Markov processes, and is considered to be stable when its state has converged to an equilibrium 
distribution [53]. Chli-De Wilde stability provides a strong notion of self-organised stability over 
time, but a possible limitation is that its current formulation does not support the necessary 
evolutionary dynamics. 

The main concept in Mean Field Theory is that for any single particle the most important 
contribution to its interactions comes from its neighbouring particles [244]. Therefore, a 
particle's behaviour can be approximated by relying upon the mean field created by its 
neighbouring particles [244] , and so Mean Field Theory could be suitable as a definition for the 
self-organised diversity of an evolving Agent Population. Naturally, it requires a neighbourhood 
model to define interaction between neighbours [244], and is therefore easily applied to domains 
such as Cellular Automata [124]. While a neighbourhood model is feasible for biological 
populations [95], evolving Agent Populations lack such neighbourhood models based on a 2D 
or 3D metric space, with the only available neighbourhood model being a distance measure 
on a parameter space measuring dissimilarity. However, this type of neighbourhood model 
cannot represent the information-based interactions between the individuals of an evolving 
Agent Population, making Mean Field Theory unsuitable as a definition for the self-organised 
diversity of an evolving Agent Population. 



3.2 Complexity 



A definition for the self-organised complexity of an evolving Agent Population should define the 
creation of coherent patterns and structures from the Agents within, with no initial constraints 
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from modelling approaches for the inclusion of pre-defined specific behaviour, but capable of 
representing the appearance of such behaviour should it occur. 

None of the proposed definitions are directly applicable for the self-organised complexity of 
an evolving Agent Population. The G-machine modelling [64] is not applicable, because it is 
only defined within the context of pre-biotic populations. Neither is the Minimum Description 
Length principle [24] or the Prugel-Bennett Shapiro formalism [253], because they require the 
involvement of subjective human judgement at the critical stage of model and quantifier selection 
[126, 284]. Kolmogorov-Chaitin complexity [48] is also not applicable as randomness is given 
maximum complexity. 

Physical Complexity [8] fulfils abstractly the required definition for the self-organised complexity 
of an evolving Agent Population, estimating complexity based upon the individuals of a 
population within the context of their environment. However, its current formulation is 
problematic, primarily because it is only defined for populations of fixed length, but as this is 
not a fundamental property of its definition [8] it should be feasible to redefine and extend it 
as needed. So, the use of Physical Complexity as a definition for the self-organised complexity 
of evolving Agent Populations will be investigated further to determine its suitability. 

3.2.1 Physical Complexity 

Physical Complexity was born [5] from the need to determine the proportion of information in 
sequences of DNA, because it has long been established [307] that the information contained 
is not directly proportional to the length, known as the C- value enigma/paradox [120]. 
Understanding DNA requires knowing the environment (context) in which it exists, which 
may initially appear obvious as DNA is considered to be the language of life [279] and the 
purpose of life is to procreate or replicate [70]. Virtually all activities of biological life-forms 
are towards this aim [70], with a few exceptions (e.g. suicide before procreation), and to achieve 
replication requires resources, energy and matter to be harvested [192]. So, for any individual 
the environment represents the problem of extracting energy for replication, and so their DNA 
sequence represents a solution to this problem. Furthermore, an individual DNA solution is not 
necessarily a simple inverse of the problem that the environment represents, with forms of life 
having evolved specialised, specific and effective ways (niches) to acquire the necessary energy 
and matter for replication [168]. Even with this understanding it would seem we still need 
to define the environment to be able to distinguish the information from the redundancy in a 
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solution (DNA sequence). However, because Physical Complexity analyses a group of solutions 
to the same problem, the consistency between the different solutions shows the information, and 
the differences the redundancy [6]. Entropy, a measure of disorder [321], is used to determine the 
redundancy from the information in a population of solutions. Physical Complexity therefore 
provides a context-relative definition for the self-organised complexity of a population without 
needing to define the context (environment) explicitly [7]. 

Physical Complexity was derived [7] from the notion of conditional complexity defined by 
Kolmogorov, which is different from traditional Kolmogorov complexity and states that the 
determination of complexity of a sequence is conditional on the environment in which the 
sequence is interpreted [177]. In contrast, traditional Kolmogorov-Chaitin (KC) complexity is 
only conditional on the implicit rules of mathematics necessary to interpret a programme on 
the tape of a Turing Machine (TM), and nothing else [177]. So, if we consider a TM that takes 
a tape e as input (which represents its physical environment), including the particular rules of 
mathematics of this world; without such a tape, this TM is incapable of computing anything, 
except for writing to the output what it reads in the input. Thus, without tape e all sequences 
s have maximal KC-complexity, because there is nothing by which to determine regularity [7] . 
However, conditional complexity can be stated as the length of the smallest programme that 
computes sequence s from an environment e, 

K(s\e) =min{|p| : s = C T (p,e)} , (3.1) 

where Ct(p, e) denotes the result of running programme p on Turing Machine T with the input 
sequence e [7]. This is not yet Physical Complexity, but rather, it is the smallest programme 
that computes the sequence s from an environment e, in the limit of sequences of infinite length, 
containing only the bits that are entirely unrelated to e, since, if they were not, they could be 
obtained from e with a programme of a size tending to zero [7]. The Physical Complexity 
K(s : e) can now be defined as the number of bits that are meaningful in sequence s (that can 
be obtained from e with a programme of vanishing size), and is given by the mutual complexity 
[154], 

K(s : e) = K(s\V>) - K(s\e), (3.2) 

where K(s\0) is the unconditional complexity with an empty input tape, e = [7]. This is 
different from the Kolmogorov complexity, because in Kolmogorov's construction the rules of 
mathematics were given to the TM [177]. As argued above, every sequence s is random if no 
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environment e is specified, as non-randomness can only exist for a specific world or environment. 
Thus, K(s\0) is always maximal, 

K(s\0) = \s\, (3.3) 

and is given by the length of s [7]. So (3.2) represents the length of the sequence s, minus 
those bits that cannot be obtained from e. So, conversely (3.2) represents the number of bits 
that can be obtained in a sequence s, by a computation with vanishing programme size, from 
e. Thus, K(s : e) represents the Physical Complexity of s [7]. The determination of the 
Physical Complexity, K(s : e), of a sequence s with a description of the environment e is not 
practical. Meaning that it cannot generally be determined by inspection, because its impossible 
to determine which, and how many, of the bits of sequence s correspond to information about 
the environment e. The reason is that we are generally unaware of the coding used to code 
information about e in s, and therefore coding and non-coding bits look entirely alike [7]. 
However, it is possible to distinguish coding from non-coding bits if we are given multiple 
copies of sequences that have adapted to the environment, or more generally, if a statistical 
ensemble (population) of sequences is available to us. Then, coding bits are revealed by non- 
uniform probability distributions across the population (conserved sites), whereas random bits 
have uniform distributions (volatile sites) [7]. The determination of complexity then becomes 
an exercise in information theory, because the average complexity (K) , in the limit of infinitely 
long strings, tends to the entropy of the ensemble of strings S* 1 [343] , 

(K(s)) s = J2pm(s)^H(S), (3.4) 

ses 

where H is defined from Shannon's (information) entropy [186], and is given by 

H(S) = log n (S), (3.5) 

where n is the number of symbols available for encoding. If each symbol is equally probable, 
we can rewrite the above function as 

H(S) = -log n (l/S) 

= -log n (p), (3.6) 



This holds for near-optimal codings. For strings s that do not code perfectly we have (K) > H [342]. 
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where p is the probability of occurrence of any one of the symbols. For a source that outputs 
an infinite sequence of bits, to communicate a finite set of symbols S, Shannon generalised the 
above function to express an average symbol length [186]. This derivation is easier to see for a 
large, but finite, number of symbols N, 



H(S) = 



E Ni [- \og N (l/S t )} £ N t [- lo gjv (lM)] 

i=l _ i=l 

5 " N 

1=1 

S N S 
-E^ P°^(V5i)] = -5>k>g*(Pi)> (3.7) 

i=i i=i 

where Ni is the number of occurrences of the symbol S^. So, given (3.4) and (3.7), the average 
complexity of the sequences s of a population S, (K(s)) s , tends to the entropy of the sequences 
s in the ensemble S [7], 

= -X>(*)logp(s). (3-8) 

ses 

(3.8) remains consistent with (3.3) as the determination of K(s\0), sequence s without an 
environment e, must equal the sequence's length \s\, because Shannon's formula for entropy 
is an average logarithmic measure of the symbol sets [186], and so the maximum entropy of a 
population is equivalent to the length of the sequences in the population, H max (S) = \s\. Indeed, 
if nothing is known about the environment to which a sequence s pertains, then according to 
the principle of indifference 2 , the probability distribution p(s) must be uniformly random. 
However, if an environment e is given we have some information about the system, and the 
probability distribution will be nonuniform. Indeed, it can be shown that for every probability 
distribution p(s\e), to find sequence s given environment e, we have 

H{S\e)<H{S\®) = \s\, (3.9) 

because of the concavity of Shannon entropy [7]. So, the difference between the maximal entropy 
H(S\$) = \s\ and H(S\e), according to the construction outlined above, represents the average 
number of bits in sequence s taken from the population S that can be obtained by zero-length 
universal programmes from the environment e. Therefore, the average mutual complexity of 



2 The principle of indifference states that if there arc n > 1 mutually exclusive and collectively exhaustive 
possibilities, which are indistinguishable except for their names then each possibility should be assigned an 



equal probability ^ [143]. 
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sequences s in a population S, given an environment e, is 

(K(s:e)) s = J>( S )K( S :e) 

ses 

w #(S|0) - #(S|e) 

= 7(5|e), (3.10) 

where 7(5'|e) is the information about the environment e stored in the population S, which we 
identify with the Physical Complexity [7]. To estimate I(S\e) it is necessary to estimate the 
entropy H(S\e) using a representative population of sequences S for a given environment e, by 
summing, over the sequences s of the population S, the probability p(s\e) multiplied by the 
logarithm of the probability p(s\e), 

H(S\e) = - ^p(s|e) logp(s|e). (3.11) 

ses 

The entropy H(S\e) can be estimated by summing the per-site H(i) entropies of the sequence, 

1*1 

where % is a site in the sequence s [7] . Random sites are identified by a nearly uniform probability 
distribution, and contribute positively to the entropy, whereas non-random sites (which have 
strongly peaked distributions) contribute very little [7]. So, the Physical Complexity, the 
average mutual complexity of sequences s in a population S for an environment e, {K(s : e)) s , 
abbreviated as C, is the maximal entropy H(S\$) minus the sum of the per-site entropies, 

M 

C = H(S\®)-J2H(i). (3.13) 

i=i 

If the sequences s are constructed from an alphabet, a set D, then the per site entropy H(i) 
for the sequences is 

H(i) = -J2Pd(*)log lDl p d (t), (3.14) 

deD 

where % is a site in the sequences ranging between one and the length of the sequences £, D is 
the alphabet of characters found in the sequences, and pd(i) is the probability that site i (in the 
sequences) takes on character d from the alphabet D, with the sum of the Pd{i) probabilities 

for each site % equalling one, Yl Pd{i) = 1 [7]. Taking the log to the base \D\ conveniently 

deD 
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normalises H(i) to range between zero and one, 

< H(i) < 1. (3.15) 
If the site i is identical across the population it will have no entropy, 

iZminW = °- ( 3 - 16 ) 

If the content of site i is uniformly random, i.e. the Pdii) probabilities all equal to r^r, it will 
have maximum entropy, 

H max (i) = 1. (3.17) 

When the entropy of H{i) is at its minimum of zero, then the site % holds information, as 
every sample shows the same character of the alphabet. When the entropy of H(i) is at its 
maximum of one, the character found in the site i is uniformly random and therefore holds no 
information. So, the amount of information is the maximal entropy of the site (3.17) minus the 
actual per-site entropy (3.14) [7], 

m = H max (i)-H(i) 

= 1-H(i). (3.18) 



DNA, whose sequence encodes the genetic information of living organisms [168] , was the original 
driver for the creation of Physical Complexity [5], and so is a good example upon which to 
demonstrate the definition. DNA sequences are made up from four nucleotides, Adenosine (A), 
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Figure 3.2: DNA Samples from a Population: DNA sequences are made up from four 
nucleotides, Adenosine (A), Thymine (T), Cytosine (C) and Guanine (G). The nucleotides 
always pair as follows, Adenosine with Thymine, and Cytosine with Guanine. So, DNA 
sequences can be reduced to a genome sequence showing half of the paired information [168]. 
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Thymine (T), Cytosine (C) and Guanine (G). The nucleotides always pair as follows, Adenosine 
with Thymine, and Cytosine with Guanine. So, DNA sequences can be reduced to a genome 
sequence showing half of the paired information [168], and with a sufficiently sized sample 
population, the Pd(i) probabilities can be estimated by the frequencies of the nucleotides at the 
sites. Considering the genome samples, in Figure 3.2, the per-site entropy for site 11 will have 
maximum entropy, as the nucleotides (characters of the alphabet) all have equal probability, 

A,T,C,G 

//(ll) = - p d (ll)log |D|Pd (ll) 
deD 

(\ 11 11 11 1\ 

given that the alphabet D equals {A, T, C, G}, and that the probabilities all equal a quarter, 
Pa (11) = Pr(H) — Pc(H) — Pg(H) — \- As the per-site entropy (randomness) is maximum, 
the information content is its minimum of zero, 

/(ll) = 1-//(11) =0. 

This intuitively makes sense, as it states that if the site content is random across the population, 
then it contains no information. At the other extreme, if we calculate the per-site entropy for 
site 16 in Figure 3.2, it will have no entropy, 

A,T,C,G 

if (16) = - £ ^(16)log |D| ^(16) 

deD 

= -(01og 4 + llog 4 l + 01og 4 + 01og 4 0) = 0, 

as the nucleotide Thymine has a probability of one, p^(16) = 1, while the other three nucleotides 
have a probability of zero, Pa(16) = Pc(16) = Pt(16) = 0. As the per-site entropy is minimum, 
the information content is its maximum of one, 

/(16) = 1-//(16) = 1. 

This also intuitively makes sense, as it states that if the site is identical across the entire 
population (no randomness), then the site holds definitive information. Finally, the per-site 
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entropy for site 19 is at neither extreme, but is entropically in the middle, 

A,T,C,G 

if (19) = PdWog\ D \Pd(W) 
deD 

Olog 4 O + Olog 4 O+^log 4 ^ + ^log 4 = ^ 

as the probabilities Pa(19) = pr(19) = 0, and pc(19) = Pg(19) = |. Intuitively, this states 
that if there is some entropy (randomness) in the samples of the site, then there is only partial 
information, 

7(19) = 1 - #(19) = \. 

For clarity the length of the sequences \s\ will be abbreviated to £ [7], 

\s\=£. (3.19) 



So, the complexity of a population S, of sequences s, is the maximal entropy of the population 
(equivalent to the length of the sequences) £, minus the sum, over the length £, of the per-site 
entropies H(i), 

i 

C = £-Y,H{i), (3.20) 

i=i 

given (3.13), (3.9) and (3.19) [7]. The equivalence of the maximum complexity to the length 
matches the intuitive understanding that if a population of sequences of length £ has no 
redundancy, then their complexity is their length £. 

If G represents the set of all possible genotypes constructed from an alphabet D that are of 
length £, then the size (cardinality) of |G| is equal to the size of the alphabet \D\ raised to the 
length £, 

\G\ = \D\ £ . (3.21) 

For the complexity measure to be accurate, a sample size of \D\ e is suggested to minimise 
the error [7, 26], but such a large quantity can be computationally infeasible. The definition's 
creator, for practical applications, chooses a population size of roughly 1.29\D\£ [8]. We suggest 
that a population size of \D\£ is sufficient to show any trends present, but that the population 
size will fluctuate when simulated, and so a population size slightly larger than \D\£ is chosen for 
simulations to ensure that the necessary minimum of \D\£ is maintained. So, for a population 
of sequences S we choose, with the definition's creator, a computationally feasible population 
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size of \D\ times £, 

\S\ > \D\l. (3.22) 

The size of the alphabet, \D\, depends on the domain to which Physical Complexity is applied. 
For RNA the alphabet is the four nucleotides, D = {A, C, G, U}, and therefore \D\ = 4 [7]. 
When Physical Complexity was applied to the Avida simulation software, there was an alphabet 
size of twenty-eight, \D\ = 28, as that was the size of the instruction set for the self- replicating 
programmes [8]. 



3.2.2 Extending to Agent Populations 

Reformulating Physical Complexity for an evolving Agent Population requires consideration of 
the following issues: the mapping of the sequence sites to the Agent-sequences, the managing of 
Populations of variable length sequences, and the non-atomicity of Agents leading to clustering 
within populations. 

3.2.2.1 Mapping Sequence Sites 

The first concern is mapping the Population's Agent-sequences to the sequence sites of Physical 
Complexity, with the intuitive approach being to map the sites to the Agents, because they are 
the functional unit of processing, the base unit for evolution in the evolving Agent Populations. 
Physical Complexity has been applied to RNA sequences [7] , and populations of self- replicating 
programmes in the artificial life simulator Avida [172]; for the RNA the sites were mapped to 
the nucleotides from which it is constructed, and for the artificial life simulator the sites were 
mapped to the programme instructions which made up the self-replicating programmes. So, 
the only alternative, of mapping the sites to the Agents, would be mapping to the programme 
instructions of the executable components of the services that the Agents represent, similarly 
to the populations of self-replicating programmes in the artificial life simulator Avida [172]. 
However, mapping to the executable components in the evolving Agent Populations would be 
like mapping to the binary representation of the instruction set in the Avida simulator, or to 
the molecules that make up the nucleotides in RNA, which in all cases would be unsuitable 
as they are the components that make up the functional units, and not the functional units 
themselves. Therefore, mapping the sequence sites of Physical Complexity to the Agents is the 
most suitable approach for evolving Agent Populations. 
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3.2.2.2 Variable Length Sequences 

Physical Complexity is currently formulated for a population of sequences of the same length [7] , 
and so we will now investigate an extension to include populations of variable length sequences, 
which will include Populations of variable length Agent-sequences. This will require changing 
and re-justifying the fundamental assumptions, specifically the conditions and limits upon 
which Physical Complexity operates. In (3.20) the Physical Complexity, C, is defined for a 
population of sequences of length £ [7]. The most important question is what does the length £ 
equal if the population of sequences is of variable length? The issue is what £ represents, which 
is the maximum possible complexity for the population [7], which we will call the complexity 

•potential Cp. The maximum complexity in (3.20) occurs when the per-site entropies sum to 

i 

zero, — > 0, as there is no randomness in the sites (all contain information), i.e. C — > £ 

i=i 

[7]. So, the complexity potential equals the length, 

C P = £, (3.23) 

provided the population S is of sufficient size for accurate calculations, as found in (3.22), i.e. j^l 
is equal or greater than \D\£. For a population of variable length sequences, Sy, the complexity 
potential, Cy p , cannot be equivalent to the length £, because it does not exist. However, given 
the concept of minimum sample size from (3.22), there is a length for a population of variable 
length sequences, £y, between the minimum and maximum length, such that the number of 
per-site samples up to and including £y is sufficient for the per-site entropies to be calculated. 
So the complexity potential for a population of variable length sequences, Cy p , will be equivalent 
to its calculable length, 

C Vp = l v . (3.24) 

If £ v where to be equal to the length of the longest individual (s) £ max in a population of variable 
length sequences Sy, then the operational problem is that for some of the later sites, between 
one and £ max , the sample size will be less than the population size \Sy\. So, having the length 
£ v equalling the maximum length would be incorrect, as there would be an insufficient number 
of samples at the later sites, and therefore £y ^ £ ma x- Consider the alternative samples of 
DNA sequences shown in Figure 3.3; if the entropy is calculated again for site 19, -ff (19) = 0, 
but there is an insufficient sample size for the estimated probabilities to provide an accurate 
calculation. Therefore, the length for a population of variable length sequences, £ v , is the 
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Figure 3.3: Alternative DNA Samples of the Population from Figure 3.2: Calculating the 
entropy for site 19 provides a value of zero, but as evident there is an insufficient sample size 
for the estimated probabilities to provide an accurate calculation. So, having the length of a 
population of variable length sequences equalling the maximum length would be incorrect. 

highest value within the range of the minimum (one) and maximum length, 1 < £y < £ max , 
for which there are sufficient samples to calculate the entropy. A function which provides the 
sample size at a given site is required to specify the value of £y precisely, 

sampleSizeii : site) : int, (3.25) 

where the output varies between 1 and the population size \Sy\ (inclusive). Therefore, the 
length of a population of variable length sequences, Ey, is the highest value within the range of 
one and the maximum length for which the sample size is greater than or equal to the alphabet 
size multiplied by the length £y, 

sampleSize(£y) > \D\£ V A sampleSize(£y + 1) < \D\£ V , (3.26) 

where £y is the length for a population of variable length sequences, and £ max is the maximum 
length in a population of variable length sequences, £y varies between 1 < £y < £ ma x, D is the 
alphabet and \D\ > 0. This definition intrinsically includes a minimum size for populations 
of variable length sequences, |-D|£y, and therefore is the counterpart of (3.22), which is the 
minimum population size for populations of fixed length. 

The length £ used in the limits of (3.14) no longer exists, and therefore (3.14) must be updated; 
so, the per-site entropy calculation for variable length sequences will be denoted by Hy(i), and 
is, 

H v{i) = - ^Pd{i) logpi p d (i), (3.27) 
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where D is still the alphabet, £y is the length for a population of variable length sequences, 
with the site % now ranging between 1 < % < £y, while the Pd(i) probabilities still range between 
< Vd(i) < 1) an d still sum to one. It remains algebraically almost identical to (3.14), but 
the conditions and constraints of its use will change, specifically £ is replaced by £y. Naturally, 
Hy{i) ranges between zero and one, as did H(i) in (3.15), 

< H v (i) < 1, (3.28) 

where i ranges between 1 and £y, and so the condition on the site i changes at the upper limit 
from £ to £y. As before in (3.16), if a site i is identical across the population, it will have no 
entropy, 

^(0 = 0, (3.29) 

where, again, i ranges between 1 and ty. Analogously to (3.17), if the Pd{i) probabilities in 
(3.27) are equal, then the site i has maximum entropy. In effect, the content of the site is 
uniformly random and therefore 

^(0 = 1 (3-30) 

is true for all i, where the Pd(i) probabilities are ^ , and where % continues to range between 1 
and ty. Analogously to (3.18), when the entropy is its minimum of zero then the site % holds 
information, as every sample shows the same character of the alphabet. However, when the 
entropy is maximum the character found in the site % is uniformly random, and therefore holds 
no information. So, the amount of information is the maximal entropy of the site (3.30) minus 
the actual per-site entropy (3.27), 

M0 = Hy_(l)-Hy{i) 

= 1-H v (i), (3.31) 

where i again now ranges between 1 and ty. Therefore, the complexity for a population of 
variable length sequences, Cy, is the complexity potential of the population of variable length 
sequences minus the sum, over the length of the population of variable length sequences, of the 
per-site entropies (3.27), 

Cy = £ v -J2 H v(i), (3-32) 

1=1 

where £y is the length for the population of variable length sequences, and Hy{i) is the entropy 
for a site i in the population of variable length sequences. 
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Population 



Cy = 4.420, %E = 88.4 Cy = 0.575, %E = 11.5 

Figure 3.4: Abstract Visualisation for Populations of Variable Length Sequences: The alphabet 
D is the set {□,□,□} , the maximum length £ max is 6 and the length for populations of variable 
length sequences iy is calculated as 5 from (3.26). The Physical Complexity and Efficiency 
values are consistent with the intuitive understanding one would have for the self- organised 
complexity of the sample populations. 

Physical Complexity can now be applied to populations of variable length sequences, so we 
will consider the abstract example populations in Figure 3.4. We will let a single square, □, 
represent a site i in the sequences, with different colours to represent the different values. 
Therefore, a sequence of sites will be represented by a sequence of coloured squares, I I I I . 
Furthermore, the alphabet D is the set {□, □, □}, the maximum length l m ax is 6 and the 
length for populations of variable length sequences iy is calculated as 5 from (3.26). The 
Physical Complexity values in Figure 3.4 are consistent with the intuitive understanding one 
would have for the self-organised complexity of the sample populations; the population with 
high Physical Complexity has a little randomness, while the population with low Physical 
Complexity is almost entirely random. 

Using our extended Physical Complexity we can construct a measure showing the use of the 
information space, called the Efficiency E, which is calculated by the Physical Complexity Cy 
over the complexity potential Cy p , 

E = ^-. (3.33) 
The Efficiency E will range between zero and one, 

< E < 1, (3.34) 

only reaching its maximum of one when the actual complexity Cy equals the complexity 
potential Cy p , indicating that there is no randomness in the population. In Figure 3.4 the 
populations of sequences are shown with their respective Efficiency values as percentages, and 
the values are as one would expect. 

The complexity Cy (3.32) is an absolute measure, whereas the Efficiency E (3.33) is a relative 
measure (based on the complexity Cy). So, the Efficiency E can be used to compare the 
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self-organised complexity of populations, independent of their size, their length, and whether 
their lengths are variable or not (as its equally applicable to the fixed length populations of the 
original Physical Complexity). 

3.2.2.3 Clustering 

The self- organised complexity of an evolving Agent Population is the clustering, amassing of 
same or similar sequences, around the optimum genome [29]. This can be visualised on a fitness 
landscape [335] , which shows the combination space (power set) of the alphabet D against the 
fitness values from the selection pressure (user request). The Agent-sequences of an evolving 
Population will evolve, moving across the fitness landscape and clustering around the optimal 
genome at the peak of the global optimum, assuming that its evolutionary process does not 
become trapped while clustering over local optima, and as shown in Figure 3.5. 




Figure 3.5: 3D Fitness Landscape with a Global Optimum: This shows the combination space 
(power set) of the alphabet D against the fitness values from the selection pressure (user request), 
resulting in a global optimum. The Agent-sequences of an evolving Population will evolve, 
moving across the fitness landscape and clustering around the optimal genome at the peak of 
the global optimum. 

Clustering is indicated by the Efficiency E tending to its maximum, as the population's Physical 
Complexity Gy tends to the complexity potential Cy p , because an optimal sequence is becoming 
dominant in the population, and therefore increasing the uniformity of the sites across the 
population. With a global optimum, the Efficiency E tends to a maximum of one, indicating 
that the evolving population of sequences is tending to a set of clusters T of size one, 



(3.35) 



3.2. Complexity 



105 



assuming its evolutionary process does not become trapped at local optima. So, the tending of 
the Efficiency E provides a clustering coefficient. It tends, never quite reaching its maximum, 
because of the mutation inherent in the evolutionary process. 




Figure 3.6: 3D Fitness Landscape with No Optimum: Theoretical extreme scenario in which the 
selection pressure is non- discriminating. So, the population occupancy of the fitness landscape 
would then be uniformly random, as any position (sequence) has the same fitness as any other. 
So the entropy (randomness) tends to maximum, resulting in the complexity Cy tending to zero, 
and therefore the Efficiency E also tending to zero. 

The other extreme scenario occurs when the number of clusters equals the size of the population, 
which would only occur with a flat fitness landscape [149] resulting from a non-discriminating 
selection pressure, as shown in Figure 3.6. The population occupancy is uniformly random, as 
any position (sequence) has the same fitness as any other. So the entropy (randomness) tends 
to maximum, resulting in the complexity Cy tending to zero, and therefore the Efficiency E 
also tending to zero, while the number of clusters \T\ tends to the number of sequences in the 
population \S\, 

E = — ^ -> as \T\ -> \S\. (3.36) 

Cy p 

So the number of clusters \T\ tends to the population size \S\, with each cluster consisting of 
only one unique sequence (individual). 

If there are global optima, as there are in Figure 3.7, the Efficiency E will tend to a maximum 
below one, because the population of sequences consists of more than one cluster, with each 
having an Efficiency tending to a maximum of one. The simplest scenario of clusters is pure 
clusters; pure meaning that each cluster uses a distinct (mutually exclusive) subset of the 
alphabet D relative to any other cluster. In this scenario the Efficiency E tends to a value 
based on the number of clusters |T|, because a number of the Pd{i) probabilities at each site 
in (3.27) are the reciprocal of the number of clusters, pw. So, given that the number of the 
Pd(i) probabilities taking the value ^ is equal to the number of clusters, while the other Pd(i) 
probabilities take a value of zero, then the per-site entropy calculation of Hy(i) from (3.27) 



106 



Chapter 3. Investigation of Digital Ecosystems 



s 

1 M 




Figure 3.7: 3D Fitness Landscape with Global Optima: Clustering scenario, in which the 
Efficiency E of the population S tends to a value based on the number of clusters \T\, because 
the population of sequences is clustering around more than one global optima, with each cluster 
having an Efficiency E tending to a maximum of one. 



becomes 

H v (i) =\og m \T\, (3.37) 

where i is the site, \D\ is the alphabet size, and \T\ is the number of clusters. Hence, given 
(3.37), (3.32), and (3.24), then the Efficiency E from (3.33) becomes 

E - 1 - (log |D| |T|), (3.38) 

where \D\ is the alphabet size and |T| is the number of clusters. Therefore, the Efficiency E, 
the clustering coefficient, tends to a value that can be used to determine the number of pure 
clusters in an evolving population of sequences. 



For a population S with clusters, each cluster is a sub-population with an Efficiency E tending 
to a maximum of one. To specify this relationship we require a function that provides the 
Efficiency E (3.33) of a population or sub-population of sequences, 



efficiency (input population) :int. (3.39) 

So, for a population S consisting of a set of clusters T, each member (cluster) t is therefore a 
sub-population of the population S, and is defined as 

t e T -> [ t C S A efficiency(t) -»• 1 A |*| « ]|| A > ' \t\ = \S\ I , (3.40) 
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where a cluster t has an Efficiency E tending to a maximum of one, and the cluster size \t\ is 
approximately equal to the population size \S\ divided by the number of clusters \T\. It is only 
approximately equal because of variation from mutation, and because the population size may 
not divide to a whole number. These conditions are true for all members t of the set of clusters 
T, and therefore the summation of the cluster sizes \t\ equals the size of the population |,S|. 

The population of sequences from the fitness landscape of Figure 3.7 is visualised in Figure 3.8, 
but the clusters within cannot be seen. So, the population is arranged to show the clustering 
in Figure 3.9, in which the two clusters are clearly evident. The clusters of the population have 
Efficiency values tending to a maximum of one, compared to the Efficiency of the population as 
a whole, which is tending to a maximum significantly below one. This is the expected behaviour 
of clusters as defined in (3.40). 

Population 



BBBBBBBBBBBBBBBBBBBB 



Cy = 1.107, C Vp = ly = 3, %E = 36.9 
D = alphabet = {□,□, □} 

Figure 3.8: Population with Hidden Clusters: Visualisation for the population of sequences 
from the fitness landscape of Figure 3. 7, with clusters visually hard to identify. The clusters 
lead to a low complexity Cy relative to the maximum Cy p , and hence the Efficiency E (3.33) 
tends to a maximum significantly below one. 

Population 



BBBBBBBBB B„B BBBBBBBBB 



Cy 



I 

Cluster 1 Cy p 
2.704, E = 0.901 



■v 



I 

3 Cluster 2 
C v = 3, E = 1 



Figure 3.9: Population with Clusters Visible: Visualisation for the population of sequences 
from Figure 3.8, which has been arranged to show the clusters present. The clusters of the 
population have Efficiency values tending to a maximum of one, compared to the Efficiency of 
the population as a whole, which is tending to a maximum significantly below one. 



The population size \S\, in Figures 3.8 and 3.9, is double the minimum requirement specified in 
(3.26), so that the complexity Cy (3.32) and Efficiency E (3.33) could be used in defining the 
principles of clustering without redefining the length of a population of variable length sequences 
iy (3.26). However, when determining the variable length £y of a cluster t, the sample size 
requirement is different, specifically a cluster t is a sub-population of S, and therefore by 
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definition cannot have a population size equivalent to S (unless the population consists of only 
one cluster). Therefore, to manage clusters requires a reformulation of ty (3.26) to 



'-v 



ysampleSize(£v) ~ ■ j^p A sampleSize[iy + 1) < • ' (3-41) 



where £ ma x is the maximum length in a population of variable length sequences, ly varies 
between 1 < £ v < £ m ax, D is the alphabet, \D\ > 0, and T is the set of clusters in the 
population S. 

A population with clusters will always have an Efficiency E tending towards a maximum 
significantly below one. Therefore, managing populations with clusters requires a reformulation 
of the Efficiency (3.33) to 



Cy 
Cv P 



if in 



E#l)=< E&(i) • (3-42) 

if \T\ > 1 



where t is a cluster, and a member of the set of clusters T of the population S. So, the Efficiency 
E c is equivalent to the Efficiency E if the population consists of only one cluster, but if there 
are clusters then the Efficiency E c is the average of the Efficiency E values of the clusters. 

a— ►|~~|— ►b b— ►|~~~|— ►c c— ►|~~|— ►cl 

D = alphabet = {□,□, □, □} a-^| \-^c 

Figure 3.10: Agent Atomicity: Property of a set of Agents, such that no single Agent can 
functionally replace any Agent-sequence, i.e. their functionality is mutually exclusive to one 
another. It is important because non-atomicity can adversely affect the Physical Complexity 
measure. In this example, the alphabet is non-atomic, with the yellow Agent able to functionally 
replace a green blue Agent-sequence. 

Atomicity is the property of a set of Agents, such that no single Agent can functionally 
replace any Agent-sequence, i.e. their functionality is mutually exclusive to one another. It 
is important because non-atomicity can adversely affect the uniformity of the calculated per- 
site entropies, which is the main construct of the Physical Complexity measure, and so non- 
atomicity risks introducing error when calculating the information content. Our extensions to 
Physical Complexity to support clustering are also necessary to manage non-atomicity, because 
it leads to the formation of clusters within evolving Agent Populations. The presence of clusters 
can be identified by the clustering coefficient, the Efficiency E tending to a value below one, 
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Population 
£ = 0.5 C v = l,C Vp =2 E c = l 

I 1 

site 2 = r [ I II II I I I 

1 1 11 1 1 

Cluster 1 Cluster 2 

Cy = 3, Cy p = 3 Cy = 2, Cy p = 2 

E = 1 £ = 1 

Figure 3.11: Population Constructed from a Non-Atomic Alphabet: The population is 
constructed from the alphabet shown in Figure 3.10, with the yellow Agent able to functionally 
replace a green blue Agent-sequence. So, the Efficiency E of the population is a half, whereas the 
Efficiency E c for populations with clusters is one, because it supports clustering and therefore 
non- atomicity. 

with the Efficiency E c (3.42) used to calculate the actual Efficiency as it supports clustering 
and therefore non-atomicity. 

If we consider the example population shown in Figure 3.11, which is constructed from the 
alphabet shown in Figure 3.10, the yellow Agent D can functionally replace a green blue 
Agent-sequence I I I . and so the uniformity across site two is lost. Therefore, the Efficiency 
E of the population is a half, whereas the Efficiency E c for populations with clusters is one, 
because it supports clustering and therefore non-atomicity. 

3.2.3 Simulation and Results 

We simulated an evolving Agent Population from the Digital Ecosystem, using our simulation 
from section 2.3 (unless otherwise specified), seeded with an alphabet (Agent-pool) of 15 Agents 
for the evolutionary process. We also added the classes and methods necessary to calculate our 
extended Physical Complexity and Efficiency, which required implementing the Cy of (3.32), 
the iy of (3.41) and the Hy of (3.27) for the per-site entropies. The Efficiency E c (3.42), for 
populations with clusters, was also implemented in the simulation. 

3.2.3.1 Physical Complexity 

Our extended Physical Complexity has the same structure and properties as the original 
Physical Complexity [7], and so the relationship between fitness and our extended Physical 
Complexity should be the same as the relationship between fitness and the original Physical 
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Updates [xlO 4 ] Updates [xlO 4 ] 



Figure 3.12: Original Physical Complexity Graphs (reprinted from [8]): The Physical 
Complexity increases over the generations, suffering short-term decreases from the arrival of 
fitter mutants, which spread through the population over several generations and causes the 
uniformity of the sites to decrease temporarily, while the maximum fitness of the population 
increases over the generations until the global optimum is reached. 

Complexity [8]. If we consider the original Physical Complexity and fitness graphs, reprinted 
from [8] in Figure 3.12, we can define the relationship as follows; the Physical Complexity 
increases over the generations, suffering short-term decreases from the arrival of fitter mutants, 
which spread through the population over several generations and causes the uniformity of 
the sites to decrease temporarily, while the maximum fitness of the population increases over 
the generations until the global optimum is reached, provided that there is a static selection 
pressure and a low mutation rate (making it unlikely that the maximum fitness will decrease) 
[8]. The original Physical Complexity starts uncharacteristically high in Figure 3.12, because 
the population is seeded with a single sequence that temporarily takes over the population [8]. 

Figure 3.13 shows, for a typical evolving Agent Population, the Physical Complexity Cy (3.32) 
for variable length sequences and the maximum fitness F max over the generations. The Physical 
Complexity for variable length sequences increases over the generations, showing short-term 
decreases as expected [8] . It increases over the generations because of the increasing information 
being stored, with the sharp increases occurring when the effective length i v of the Population 
increases. The temporary decreases, such as the one beginning at generation 138, are preceded 
by the advent of a new fitter mutant, as indicated by a corresponding sharp increase in the 
maximum fitness in the immediately preceding generations, which temporarily disrupt the self- 
organised complexity of the population, until this new fitter mutant becomes dominant and 
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Figure 3.13: Graph of Physical Complexity and Maximum Fitness over the Generations: The 
Physical Complexity for variable length sequences increases over the generations, showing short- 
term decreases as expected [8j. It increases over the generations because of the increasing 
information being stored, with the sharp increases occurring when the effective length £y of the 
Population increases. 



leads to a new higher level of self-organised complexity. Figure 3.13 shows that the fitness 
and our extended Physical Complexity; both increase over the generations, synchronised with 
one another, until generation 160 when the maximum fitness tapers off more slowly than the 
Physical Complexity. At this point the optimal length for the sequences is reached within the 
simulation, and so the advent of new fitter sequences (of the same of similar length) creates 
only minor fluctuations in the Physical Complexity, while having a more significant effect on 
the maximum fitness. 

The similarity of the graph in Figure 3.13 to the graphs in Figure 3.12 confirms that the 
Physical Complexity measure has been successfully extended to variable length sequences. The 
temporary decreases in the Physical Complexity Cy for variable length sequences were not 
as severe as the original [8], because our simulation's mutation rate was relatively low at only 
10%. Also, our Physical Complexity Cy does not start uncharacteristically high like the original, 
because at the start the entire population was randomly seeded, instead of being seeded by just 
a single individual as in the original [8]. 
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3.2.3.2 Efficiency 



Figure 3.14 is a visualisation of the simulation, showing two alternate Populations that were 
run for a thousand generations, with the one on the left from Figure 3.13 run under normal 
conditions, while the one on the right was run with a non-discriminating selection pressure; 
each multi-coloured line represents an Agent-sequence, while each colour represents an Agent 
(site). The visualisation shows that our Efficiency E accurately measures the self-organised 
complexity of the two Populations. It also shows significant variation in the Population run 
under normal conditions, as the evolutionary computing process creates the opportunity to find 
fitter (better) sequences, providing potential to avoid getting trapped at local optima. 



Population (normal conditions) Population (non-discriminating) 




1 Agent-sequence (length) £ v 1 t v 



Figure 3.14: Visualisation of Evolving Agent Populations at the 1000th Generation: Each 
multi-coloured line represents an Agent-sequence, while each colour represents an Agent (site). 
The population on the left from Figure 3.13 was run under normal conditions, while the one 
on the right was run with a non-discriminating selection pressure. The visualisation shows that 
our Efficiency E accurately measures the self- organised complexity of the two Populations. 

Figure 3.15 shows the Efficiency E (3.33), over the generations, for the Population from 
Figure 3.13. The Efficiency tends to a maximum of one, indicating that the Population 
consists of one cluster, which is confirmed by the visualisation of the Population in Figure 
3.14 (left). The significant decreases that occurred in the Efficiency, reducing in magnitude 
and frequency over the generations, came from mirroring the fluctuations that occurred in the 
complexity Cy, because the Efficiency E (3.33) is the complexity Cy (3.32) over the complexity 
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Figure 3.15: Graph of Population Efficiency over the Generations for the Population from 
Figure 3.13: The Efficiency tends to a maximum of one, indicating that the Population consists 
of one cluster, which is confirmed by the visualisation of the Population in Figure 3.14 (left). 
The significant decreases that occurred in the Efficiency, reducing in magnitude and frequency 
over the generations, came from mirroring the fluctuations that occurred in the complexity Cy. 

potential Cy p (3.24). These falls are caused by the creation of fitter (better) mutants within the 
population, which eventually become the dominant genotype, but during the process causes 
the Physical Complexity and the Efficiency to fall in the short-term. 



3.2.3.3 Clustering 



To further investigate the self-organised complexity of evolving Agent Populations, we simulated 
a typical Population with a multi-objective selection pressure that had two independent global 
optima (like the fitness landscape of Figure of 3.7), and so the potential to support two pure 
clusters (each cluster using a unique subset of the alphabet D). The graph in Figure 3.16 shows 
the Efficiency E over the generations acting as a clustering coefficient, oscillating around the 
included best fit curve, quite significantly at the start, and then decreasing as the generations 
progressed. The initial severe oscillations were caused by the creation and spread of fitter longer 
mutants (Agent-sequences) in the Population, causing the Physical Complexity and therefore 
the Efficiency to fluctuate significantly. The Efficiency tended to 0.744, as expected from (3.38) 
given the alphabet size was fifteen, |Z)|=15, and the number of clusters was two, |T|=2. The 
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Generation 



Figure 3.16: Graph of the Clustering Coefficient over the Generations: The Efficiency oscillated 
around the included best fit curve, quite significantly at the start, and then decreasing as the 
generations progressed. It tended to 0.744, as expected from (3.38) given the alphabet size was 
fifteen, \D\=15, and the number of clusters was two, \T\=2, indicating more than one cluster. 



Population: C v = 14.60 , %E = 73.2% , %E, 
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Figure 3.17: Visualisation of Clusters in an Evolving Agent Population at the 1000th 
Generation: The Agent-sequences were grouped to show the two clusters, and as expected from 
(3.40) each cluster had a much higher Physical Complexity and Efficiency compared to the 
Population as a whole. However, the Efficiency E c calculated the complexity correctly. 
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tending itself indicated clustering, while the value it tended to indicated, as expected, the 
presence of two clusters in the Population. A visualisation of the Population is shown in Figure 
3.17, in which the Agent-sequences were grouped to show the two clusters. As expected from 
(3.40) each cluster had a much higher Physical Complexity and Efficiency compared to the 
Population as a whole. However, the Efficiency E c is immune to the clusters and therefore 
calculated the self-organised complexity of the Population correctly. 



3.2.4 Summary 



None of the existing definitions we considered [64, 24, 253, 48, 8] were directly applicable as a 
definition for the self-organised complexity of an evolving Agent Population, but the properties 
of Physical Complexity [8] closely matched our intuitive understanding, and so was chosen for 
further investigation. Based upon information theory and entropy, it provides a measure of 
the quantity of information in a population's genome, relative to the environment in which it 
evolves, by calculating the entropy in the population to determine the randomness in the genome 
[8]. Reformulating Physical Complexity for evolving Agent Populations required consideration 
of the following issues: the mapping of the sequence sites to the Agent-sequences, and the 
managing of Populations of variable length sequences. We then built upon this to construct a 
variant of the Physical Complexity called the Efficiency, because it was based on the efficiency 
of information storage in Physical Complexity, which we then used to develop an understanding 
of clustering and atomicity within evolving Agent Populations. 

We then investigated the self-organised complexity of evolving Agent Populations through 
experimental simulations, for which our extended Physical Complexity was consistent with 
the original. We then investigated the Efficiency, which performed as expected, confirmed 
by the numerical results and Population visualisations matching our intuitive understanding. 
We then applied the Efficiency to the determination of clusters when subjecting an evolving 
Agent Population to a multi-objective selection pressure. The numerical results, combined 
with the visualisation of the multi-cluster Population, confirmed the ability of the Efficiency 
to act as a clustering coefficient, not only indicating the occurrence of clustering, but also the 
number of clusters (for pure clusters). We also confirmed that the Efficiency E c for populations 
with clusters was able to calculate correctly the self-organised complexity of evolving Agent 
Populations with clusters. 
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Collectively, the experimental results confirm that Physical Complexity has been successfully 
extended to evolving Agent Populations. Most significantly Physical Complexity has been 
reformulated algebraically for populations of variable length sequences, which we have confirmed 
experimentally through simulations. Our Efficiency definition provides a macroscopic value to 
characterise the level of complexity. Furthermore, the clustering coefficient defined by the 
tending of the Efficiency, not only indicates clustering, but can also distinguish between a 
single cluster population and a population with clusters. The number of clusters can even 
be determined, for pure clusters, from the value to which the clustering coefficient tends. 
Combined, this allows the Efficiency E c definition to provide a normalised universally applicable 
macroscopic value to characterise the complexity of a population, independent of clustering, 
atomicity, length (variable or same), and size. 

We have determined an effective understanding and quantification for the self-organised 
complexity of the evolving Agent Populations of our Digital Ecosystem. Furthermore, the 
understanding and techniques we have developed have applicability beyond evolving Agent 
Populations, as wide as the original Physical Complexity, which has been applied from DNA 
[7] to simulations of self-replicating programmes [172]. 

3.3 Stability 

A definition for the self-organised stability of an evolving Agent Population should define 
the resulting stability or instability that emerges over time, with no initial constraints from 
modelling approaches for the inclusion of pre-defined specific behaviour, but capable of 
representing the appearance of such behaviour should it occur. 

None of the proposed definitions are directly applicable for the self-organised stability of an 
evolving Agent Population. The G-machine modelling [64] is not applicable, because it is only 
defined within the context of pre-biotic populations. The Priigel-Bennett Shapiro formalism 
[253] is not suitable, because it necessitates the involvement of subjective human judgement at 
the critical stage of quantifier selection. Self-Organised Criticality [17] is also not applicable as 
it only models the events of genetic change in the population over time, rather than measuring 
the resulting stability or instability of the population. Neither is Evolutionary Game Theory 
[330], which only defines the genetic stability of the genotypes, in terms of equilibrium and 
non- equilibrium dynamics, instead of the stability of the population as a whole. 
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Chli-DeWilde stability of Multi-Agent Systems [53] does fulfil the required definition of the 
self-organised stability, measuring convergence to an equilibrium distribution. However, its 
current formulation does not include Multi-Agent Systems that make use of evolutionary 
computing algorithms, i.e. our evolving Agent Populations, but it could be extended to include 
such Multi- Agent Systems, because its Markov-based modelling approach is well established 
in evolutionary computing [272]. While there has been past work on modelling evolutionary 
computing algorithms as Markov chains [271, 232, 111, 89], we have found none including Multi- 
Agent Systems despite both being mature research areas [236, 195], because their integration 
is a recent development [290]. So, the use of Chli-DeWilde stability as a definition for the 
self-organised stability of evolving Agent Populations will be investigated further to determine 
its suitability. 

3.3.1 Chli-DeWilde Stability 

Chli-DeWilde stability was created to provide a clear notion of stability in Multi- Agent Systems 
[53] , because stability is perhaps one of the most desirable features of any engineered system, 
given the importance of being able to predict its response to various environmental conditions 
prior to actual deployment; and while computer scientists often talk about stable or unstable 
systems [306, 18], they did so without having a concrete or uniform definition of stability. 
Also, other properties had been widely investigated, such as openness [2], scalability [198] and 
adaptability [286], but stability had not. So, the Chli-DeWilde definition of stability for Multi- 
Agent Systems was created [53], based on the stationary distribution of a stochastic system, 
modelling the agents as Markov processes, and therefore viewing a Multi-Agent System as a 
discrete time Markov chain with a potentially unknown transition probability distribution. The 
Multi-Agent System is considered to be stable once its state has converged to an equilibrium 
distribution [53], because stability of a system can be understood intuitively as exhibiting 
bounded behaviour. 

Chli-DeWilde stability was derived [52] from the notion of stability defined by De Wilde [77, 
169], based on the stationary distribution of a stochastic system, making use of discrete-time 
Markov chains, which we will now introduce 3 . If we let / be a countable set, in which each 
i G / is called a state and / is called the state-space. We can then say that A = (Aj : i G /) 

3 A more comprehensive introduction to Markov chain theory and stochastic processes is available in [234] and 
[61]. 
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is a measure on I if < Aj < oo for all i E I, and additionally a distribution if X^e/ ^» = 1 
[52]. So, if X is a random variable taking values in I and we have A, = Pr(X = i), then A is 
the distribution of X, and we can say that a matrix P = {p^ : i,j e /) is stochastic if every 
row (py : j G /) is a distribution [52]. We can then extend familiar notions of matrix and 
vector multiplication to cover a general index set / of potentially infinite size, by defining the 
multiplication of a matrix by a measure as \P, which is given by 



We can now describe the rules for a Markov chain by a definition in terms of the corresponding 
matrix P [52]. 

Definition 1. We say that (X*) t > is a Markov chain with initial distribution X — (X i :ieI) 
and transition matrix P = (p^ : i,j e /) if: 

1. Pr(X° = i ) = \ io and 

2. Pr(X* +1 = i t+l \ X° = i ,...,X t = i t ) = p itit+1 . 

We abbreviate these two conditions by saying that (X*) t > is Markov(\, P) . 

In this first definition the Markov process is memoryless, resulting in only the current state of 
the system being required to describe its subsequent behaviour. We say that a Markov process 
JT°, JT 1 , . . . , X* has a stationary distribution if the probability distribution of X 1 becomes 
independent of the time t [53]. So, the following theorem is an easy consequence of the second 
condition from the first definition. 

Theorem 1. A discrete-time random process (X') t > is Markov(\, P) , if and only if for all t 
and io, . . . , i t we have 




(3.43) 



Pr(X° = i , . . . , X* = i t ) = X io Pi oil ■ ■ ■ p it _ lit . 



(3.44) 



This first theorem depicts the structure of a Markov chain, illustrating the relation with the 
stochastic matrix P, and defining its time-invariance property [52]. 
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Theorem 2. Let (X*) t > be Markov(X,P), then for all t,s> 0: 

1. Pr(X* = j) = (AP*)j and 

2. Pr(X* = j | X° = i) = Pr(X t+s = j \ X s = i) = (P%. 
For convenience (P^ij can be more conveniently denoted as pf) . 

Given this second theorem we can define pf) as the t-step transition probability from the state 
i to j [52], and we can now introduce the concept of an invariant distribution [52], in which we 
say that A is invariant if 

XP = X. (3.45) 

The next theorem will link the existence of an invariant distribution, which is an algebraic 
property of the matrix P, with the probabilistic concept of an equilibrium distribution. This 
only applies to a restricted class of Markov chains, namely those with irreducible and aperiodic 
stochastic matrices. However, there is a multitude of analogous results for other types of Markov 
chains to which we can refer [234, 61], and the following theorem is provided as an indication 
of the family of theorems that apply. An irreducible matrix P is one for which, for all i,jel 
there are sufficiently large t,pf) > 0, and is aperiodic if for all states « £ I we have p\f > for 
all sufficiently large t [52] . 

Theorem 3. Let P be irreducible, aperiodic and have an invariant distribution, X can be any 
distribution, and suppose that (X*) t > is Markov(X, P) [52], then 

Pr(X* = j) -> pf as t -> oo for all j el (3.46) 
and 

Pij ^Pfast^oo for all i,j G I. (3.47) 

We can now view a system S as a countable set of states / with implicitly defined transitions 
P between them, and at time t the state of the system is the random variable X 1 , with the key 
assumption that (X') t is Markov(A,P) [52]. 

Definition 2. The system S is said to be stable when the distribution of the its states converge 
to an equilibrium distribution, 

Pr(X* = j) -> pf as t -> oo for allj E I. (3.48) 
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More intuitively, the system S, a stochastic process X° '^X 1 ,X 2 is stable if the probability 
distribution of X 1 becomes independent of the time index t for large t [53] . Most Markov chains 
with a finite state-space and positive transition probabilities are examples of stable systems, 
because after an initialisation period they settle down on a stationary distribution [52]. 

A Multi-Agent System can be viewed as a system S, with the system state represented by a 
finite vector X, having dimensions large enough to manage the agents present in the system. 
The state vector will consist of one or more elements for each agent, and a number of elements 
to define general properties of the system state. We can then model an agent as being dead, 
i.e. not being present in the system, by setting the vector elements for that agent to some 
predefined null value [52]. 

3.3.2 Extensions for Evolving Populations 

Extending Chli-DeWilde stability to the class of Multi-Agent Systems that make use of 
evolutionary computing algorithms, including our evolving Agent Populations, requires con- 
sideration of the following issues: the inclusion of population dynamics, and an understanding 
of population macro-states. 

3.3.2.1 Population Dynamics 

First, the Multi- Agent System of an evolving Agent Population is composed of n Agent- 
sequences, with each Agent-sequence % in a state at time t, where % = 1, 2, . . . , n. The states of 
the Agent-sequences are random variables, and so the state vector for the Multi- Agent System 
is a vector of random variables £*, with the time being discrete, t — 0, 1, . . . . The interactions 
among the Agent-sequences are noisy, and are given by the probability distributions 



where X; is a value for the state of Agent-sequence %, and Y is a value for the state vector 
of the Multi- Agent System. The probabilities implement a Markov process [300], with the 
noise caused by mutations. Furthermore, the Agent-sequences are individually subjected to a 
selection pressure from the environment of the system, which is applied equally to all the Agent- 
sequences of the population. So, the probability distributions are statistically independent, and 



Pr(A^|Y) = Pr(£ 



■t+i _ 



Xi|£* = Y), \,...,n, 



(3.49) 
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Pr(X|Y) = ITU Pr(£ +1 = Xil? = Y). (3.50) 
If the occupation probability of state X at time t is denoted by p^, then 

^ = ^Pr(X|Y)^ 1 - (3.51) 

Y 

This is a discrete time equation used to calculate the evolution of the state occupation 
probabilities from t — 0, while equation (3.50) is the probability of moving from one state 
to another. The Multi-Agent System (evolving Agent Population) is self-stabilising if the limit 
distribution of the occupation probabilities exists and is non-uniform, i.e. 

p% = lim^p^ (3.52) 

exists for all states X, and there exist states X and Y such that 

Pl^ p ~. (3.53) 

These equations define that some configurations of the system, after an extended time, will 
be more likely than others, because the likelihood of their occurrence no longer changes. Such 
a system is stable, because the likelihood of states occurring no longer changes with time, 
and is the definition of stability developed in [53]. While equation (3.52) is the probabilistic 
equivalence of an attractor 4 in a system with deterministic interactions, which we had to extend 
to a stochastic process because mutation is inherent in evolutionary dynamics. 

Although the number of agents in the Chli-DeWilde formalism can vary, we require it to vary 
according to the selection pressure acting upon the evolving Agent Population. We must 
therefore formally define and extend the definition of dead agents, by introducing a new state 
d for each Agent-sequence. If an Agent-sequence is in this state, = d, then it is dead and 
does not affect the state of other Agent-sequences in the population. If an Agent-sequence % 
has low fitness then that Agent-sequence will likely die, because 

Pr(d|Y) = Pr(£ +1 = d\e = Y) (3.54) 



An attractor is a set of states, invariant under the dynamics, towards which neighbouring states asymptotically 
approach during evolution [332]. 
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will be high for all Y. Conversely, if an Agent-sequence has high fitness, then it will likely 
replicate, assuming the state of a similarly successful Agent-sequence (mutant), or crossover 
might occur changing the state of the successful Agent-sequence and another Agent-sequence. 

3.3.2.2 Population Macro-States 

As we defined earlier, the state of an evolving Agent Population is determined by the collection 
of Agent-sequences of which it consists at a specific time t, and potentially changing state as the 
time t increases. So, we can define a macro-state M as a set of states with a common property, 
here possessing at least one copy of the current maximum fitness individual. Therefore, by its 
definition, each macro-state M must also have a maximal state composed entirely of copies of 
the current maximum fitness individual. There must also be a macro-state consisting of all the 
states that have at least one copy of the global maximum fitness individual, which we will call 
the maximum macro-state M mar - 



Figure 3.18: State-Space of an Evolving Agent Population: A possible evolutionary path through 
the state-space I is shown, with the selection pressure of the evolutionary process driving it 
towards the maximal state of the maximum macro-state M max , which consists entirely of copies 
of the optimal solution, and is the equilibrium state that the system S is forever falling towards 
without ever quite reaching, because of the noise (mutation) within the system. 

We can consider the macro-states of an evolving Agent Population visually through the 
representation of the state-space / of the system S shown in Figure 3.18, which includes a 
possible evolutionary path through the state-space /. Traversal through the state-space I is 
directed by the selection pressure of the evolutionary process acting upon the Population S, 
driving it towards the maximal state of the maximum macro-state M max , which consists entirely 
of copies of the optimal solution, and is the equilibrium state that the system S is forever falling 



maximum macro-state M, 



max 
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towards without ever quite reaching, because of the noise (mutation) within the system. So, 
while this maximal state will never be reached, the maximum macro-state M max itself is certain 
to be reached, provided the system does not get trapped at local optima, i.e. the probability 
of being in the maximum macro-state M max at infinite time is one, PM max = 1, as defined from 
equation (3.51). 

Furthermore, we can define quantitatively the probability distribution of the macro-states that 
the system occupies at infinite time. For a stable system, as defined by equation (3.53), the 
degree of instability, d ins , can be defined as the entropy of its probability distribution at infinite 
time, 

d ms = H{p°°) = -J2p* 1o 9n(Px), (3.55) 
x 

where N is the number of possible states, and taking log to the base N normalises the degree 
of instability. The degree of instability will range between zero (inclusive) and one (exclusive), 
because a maximum instability of one would only occur during the theoretical extreme scenario 
of a non- discriminating selection pressure [149] (as shown in Figure 3.6). 



3.3.3 Simulation and Results 

We simulated an evolving Agent Population from the Digital Ecosystem, using our simulation 
from section 2.3 (unless otherwise specified), seeded with an Agent-pool of 20 Agents 5 for the 
evolutionary process. We also added the classes and methods necessary to implement our 
extended Chli-DeWilde stability and degree of instability, which required calculating p£ of 

(3.51) to estimate the stability, and p^l of (3.52) to prove the existence of p^ ^ p^ from (3.53). 
The degree of instability, d ins of (3.55), was also implemented in the simulation. 

3.3.3.1 Stability 

Our evolving Agent Population (a Multi-Agent System with evolutionary dynamics) is stable 
if the distribution of the limit probabilities exists and is non-uniform, as defined by equations 

(3.52) and (3.53). The simplest case is a typical evolving Agent Population with one global 

5 From optimisation improvements to the code base of the simulation, we were able to increase the size of the 
Agent-pool from 15 to 20 without any significant degradation in performance, within the scope of running 
tens of thousands of simulation runs. 
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optimal solution, which is stable if there are at least two macro-states with different limit 
occupation probabilities. We shall consider the maximum macro-state M max and the sub- 
optimal macro-state M ha if. Where the states of the macro-state M max each possess at least one 
individual with global maximum fitness, 

PM raax = ^ m ^oori2 max = 1, 

while the states of the macro-state M ha if each possess at least one individual with a fitness 
equal to half of the global maximum fitness, 

PM half = ^^t-oop2 half = 0, 

thereby fulfilling the requirements of equations (3.52) and (3.53). The sub-optimal macro-state 
Mhaif, having a lower fitness, is predicted to be seen earlier in the evolutionary process before 
disappearing as higher fitness macro-states are reached. The system S will take longer to 
reach the maximum macro-state M max , but once it does will likely remain, leaving only briefly 
depending on the strength of the mutation rate, as the selection pressure is non-elitist 6 (as 
defined in section 2.3). 

A value of t — 1000 was chosen to represent t = oo experimentally, because the simulation 
has often been observed to reach the maximum macro-state M max within 500 generations. 
Therefore, the probability of the system S being in the maximum macro-state M max at the 
thousandth generation is expected to be one, p™ ax = 1. Furthermore, the probability of the 
system being in the sub-optimal macro-state Mhaif at the thousandth generation is expected 
to be zero, p™ 00 = 0. 

Figure 3.19 shows, for a typical evolving Agent Population, a graph of the probability as defined 
by equation (3.51) of the maximum macro-state M max and the sub-optimal macro-state Mhaif 
at each generation, averaged from ten thousand simulation runs for statistical significance. 
The behaviour of the simulated system S was as expected, being in the maximum macro-state 
M max only after generation 178 and always after generation 482. It was also observed being 
in the sub-optimal macro-state M ha if only between generations 37 and 113, with a maximum 

6 Non-elitist meaning that the best individual from one generation was not guaranteed to survive to the next 
generation; it had a high probability of surviving into the next generation, but it was not guaranteed as it 
might have been mutated [90]. 
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Figure 3.19: Graph of the Probabilities of the Macro-States M max and Mh a if at each Generation: 
The system S, a typical evolving Agent Population, was in the maximum macro-state M max only 
after generation 178 and always after generation 482. It was also observed being in the sub- 
optimal macro-state Mhaif only between generations 37 and 113, with a maximum probability 
of 0.053 (3 d.p.) at generation 61. 

probability of 0.053 (3 d.p.) at generation 61, and was such because the evolutionary path 
(state transitions) could avoid visiting the macro-state. As expected the probability of being 
in the maximum macro-state M max at the thousandth generation was one, = 1, and so 

the probability of being in any other macro-state, including the sub-optimal macro-state M ha if, 
at the thousandth generation was zero, PM^ ]t = 0. 

A visualisation for the state of a typical evolving Agent Population at the thousandth generation 
is shown in Figure 3.20, with each line representing an Agent-sequence and each colour 
representing an Agent, with the identical Agent-sequences grouped for clarity. It shows that 
the evolving Agent Population reached the maximum macro-state M max and remained there, 
but as expected never reached the maximal state of the maximum macro-state M max , where 
all the Agent-sequences are identical and have maximum fitness, which is indicated by the lack 
of total uniformity in Figure 3.20. This was expected, because of the mutation (noise) within 
the evolutionary process, which is necessary to create the opportunity to find fitter (better) 
sequences and potentially avoid getting trapped at any local optima that may be present. 
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M Agent-Sequence ► 

Figure 3.20: Visualisation of an Evolving Agent Population at the 1000th Generation: The 
Population consists of multiple Agent- sequences, with each line representing an Agent-sequence, 
and therefore each colour representing an Agent. The identical Agent- sequences were grouped 
for clarity, and as expected the system S reached the maximum macro-state M max and remained 
there, but never reached the maximal state of the maximum macro-state M max . 



3.3.3.2 Degree of Instability 

Given that our simulated evolving Agent Population is stable as defined by equations (3.52) and 
(3.53), we can determine the degree of instability as defined by equation (3.55). So, calculated 
from its limit probabilities, the degree of instability was 



x 

= -llog N (l) 
= 0, 



where t = 1000 is an effective estimate for t = oo, as explained earlier. The result was as 
expected because the maximum macro-state M max at the thousandth generation was one, 
£*Mmax = 1> an d so the probability of being in the other macro-states at the thousandth 
generation was zero. The system therefore shows no instability, as there is no entropy in 
the occupied macro-states at infinite time. 
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3.3.3.3 Stability Analysis 




Figure 3.21: Graph of Stability with Different Mutation and Crossover Rates: With the 
mutation rate under or equal to 60%, the evolving Agent Population showed no instability, with 
di ns values equal to zero as the system S was always in the same macro-state M at infinite time, 
independent of the crossover rate. With the mutation rate above 60% the instability increased 
significantly. 



We then performed a stability analysis (similar to a sensitivity analysis [43]) of a typical evolving 
Agent Population, varying key parameters within the simulation. We varied the mutation and 
crossover rates from 0% to 100% in 10% increments, calculating the degree of instability, di ns 
from (3.55), at the thousandth generation. These degree of instability values were averaged over 
ten thousand simulation runs, and graphed against the mutation and crossover rates in Figure 
3.21. It shows that the crossover rate had little effect on the stability of our simulated evolving 
Agent Population, whereas the mutation rate did significantly affect the stability. With the 
mutation rate under or equal to 60%, the evolving Agent Population showed no instability, with 
dins values equal to zero as the system S was always in the same macro-state M at infinite time, 
independent of the crossover rate. With the mutation rate above 60% the instability increased 
significantly, with the system being in one of several different macro-states at infinite time; 
with a mutation rate of 70% the system was still very stable, having low di ns values ranging 
between 0.08 (2 d.p.) and 0.16 (2 d.p.), but once the mutation rate was 80% or greater the 
system became quite unstable, shown by high d ins values nearing 0.5. 
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As one would have expected, an extremely high mutation rate has a destabilising effect on 
the stability of an evolving Agent Population. The crossover rate had only a minimal effect, 
because variation from crossover was limited when the Population had matured, consisting of 
Agent-sequences identical or very similar to one another. It should also be noted that the 
stability of the system is different to its performance, because although showing no instability 
with mutation rates below 60% (inclusive), it only reached the maximum macro-state M max 
with a mutation rate of 10% or above, while at 0% it was stable at a sub-optimal macro-state. 

3.3.4 Summary 

None of the existing definitions we considered [64, 253, 17, 53] were directly applicable as a 
definition for the self-organised stability of an evolving Agent Population, but the properties 
of Chli-De Wilde stability [53] closely matched our intuitive understanding, and so was chosen 
for further investigation. It views a Multi- Agent System as a discrete time Markov chain (with 
potentially unknown transition probabilities) that is considered to be stable when its state, 
a stochastic process, has converged to an equilibrium distribution. Extending Chli-DeWilde 
stability to the Multi- Agent System of an evolving Agent Population required consideration of 
the following issues: the inclusion of population dynamics, and an understanding of population 
macro-states. We then built upon this to construct an entropy-based definition for the degree 
of instability (entropy of the limit probabilities), which was later used to perform a stability 
analysis of an evolving Agent Population. 

We then investigated the self-organised stability of evolving Agent Populations through 
experimental simulations, and the results showed that there was a limit probability distribution, 
and that it was non-uniform. Furthermore, the reaching of the maximum macro-state was 
confirmed by a visualisation matching the numerical results. We then applied our degree 
of instability to determine that there was no instability under normal conditions, and then 
performed a stability analysis (similar to a sensitivity analysis [43]) showing the variation of the 
self-organised stability under varying conditions. Collectively, the experimental results confirm 
that Chli-Dewilde stability has been successfully extended to evolving Agent Populations, while 
our definition for the degree of instability provides a macroscopic value to characterise the level 
of stability. 

We have determined an effective understanding and quantification for the self-organised stability 
of the evolving Agent Populations of our Digital Ecosystem. Also, our extended Chli-De Wilde 
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stability is applicable to other Multi- Agent Systems with evolutionary dynamics. Furthermore, 
our degree of instability provides a definition for the level of stability, applicable to Multi- Agent 
Systems with or without evolutionary dynamics. 

3.4 Diversity 

A definition for the self-organised diversity of an evolving Agent Population should define the 
optimal variability, of the Agents and Agent-sequences, that emerge over time, with no initial 
constraints from modelling approaches for the inclusion of pre-defined specific behaviour, but 
capable of representing the appearance of such behaviour should it occur. 

None of the proposed definitions are applicable for the self-organised diversity of an evolving 
Agent Population. The G-machine modelling [64] is not applicable, because it is only defined 
within the context of pre-biotic populations. Neither is the Minimum Description Length 
[24] principle or the Priigel-Bennett Shapiro formalism [253] suitable, because they necessitate 
the involvement of subjective human judgement at the critical stages of model or quantifier 
selection. Mean Field Theory is also not applicable because of the necessity of a neighbourhood 
model for defining interaction, and evolving Agent Populations lack a 2D or 3D metric space 
for such models. So, the only available neighbourhood model becomes a distance measure on 
a parameter space that measures dissimilarity. However, this type of neighbourhood model 
cannot represent the information-based interactions between the individuals of an evolving 
Agent Population. 

We suggest that the uniqueness of Digital Ecosystems makes the application of existing 
definitions inappropriate for the self-organised diversity, because while we could extend a 
biology-centric definition for the self-organised complexity, and a computing-centric definition 
for the self-organised stability, we found neither of these approaches, or any other, appropriate 
for the self-organised diversity. The Digital Ecosystem being the digital counterpart of a 
biological ecosystem gives it unique properties, as discussed earlier in section 2.4. So, the 
evolving Agent Populations possess properties of both computing systems (e.g. agent systems) 
as well as biological systems (e.g. population dynamics), and the combination of these 
properties makes them unique. So, we will further consider the evolving Agent Populations 
to create a definition for their self-organised diversity. 
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3.4.1 Evolving Agent Populations 

The self-organised diversity of an evolving Agent Population comes from the Agent-sequences 
it evolves, in response to the selection pressure, seeded with Agents and Agent-sequences from 
the Agent-pool of the Habitat in which it is instantiated. The set of Agents and Agent- 
sequences available when seeding an evolving Agent Population is regulated over time by 
other evolving Agent Populations, instantiated in response to other user requests, leading to 
the death and migration of Agents and Agent-sequences, as well as the formation of new 
Agent-sequence combinations. The seeding of existing Agent-sequences provides a direction to 
accelerate the evolutionary process, and can also affect the self-organised diversity; for example, 
if only a proportion of any available global optima is favoured. So, the set of Agents available 
when seeding an evolving Agent Population provides potential for the self-organised diversity, 
while the selection pressure of a user request provides a constraining factor on this potential. 
Therefore, the optimality of the self-organised diversity of an evolving Agent Population is 
relative to the selection pressure of the user request for which it was instantiated. 

While we could measure the self-organised diversity of individual evolving Agent Populations, 
or even take a random sampling, it will be more informative to consider their collective self- 
organised diversity. Also, given that the Digital Ecosystem is required to support a range of 
user behaviour, we can consider the collective self-organised diversity of the evolving Agent 
Populations relative to the global user request behaviour. So, when varying a behavioural 
property of the user requests according to some distribution, we would expect the corresponding 
property of the evolving Agent Populations to follow the same distribution. We are not 
intending to prescribe the expected user behaviour of the Digital Ecosystem, but investigate 
whether the Digital Ecosystem can adapt to a range of user behaviour in terms of the self- 
organised diversity. So, we will consider Uniform, Gaussian (Normal) and Power distributions 
for the parameters of the user request behaviour. The Uniform distribution will provide a 
control, while the Normal (Gaussian) distribution will provide a reasonable assumption for the 
behaviour of a large group of users, and the Power distribution will provide a relatively extreme 
variation in user behaviour. 

3.4.2 Simulation and Results 

We simulated the Digital Ecosystem, using our simulation from section 2.3 (unless otherwise 
specified). We also added the classes and methods necessary to vary aspects of the user 
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behaviour according to different distributions, and a way to measure the related aspects of the 
evolving Agent Populations. This consisted of a mechanism to vary the user request properties 
of length and modularity, according to Uniform, Gaussian (normal) and Power distributions, and 
a mechanism to measure the corresponding Agent (-sequence) properties of length and number 
of attributes. For statistical significance each scenario (experiment) will be averaged from ten 
thousand simulation runs. We expect it will be obvious whether the observed behaviour of the 
Digital Ecosystem matches the expected behaviour from the user base. Nevertheless, we will 
also implement a chi-squared (x 2 ) test to determine if the observed behaviour (distribution) of 
the Agent (-sequence) properties matches the expected behaviour (distribution) from the user 
request properties. 

Given the requirement to run a minimum of sixty thousand simulation runs, ten thousand for 
each experiment, we adapted the code base of the simulation to take advantage of the Xgrid 
[159] distributed computing technology, and therefore make use of the grids mentioned in the 
acknowledgements. 



3.4.2.1 User Request Length 

We started by varying the user request length according to the available distributions, expecting 
the length of the Agent-sequences to be distributed according to the length of the user requests, 
i.e. the longer the user request, the longer the Agent-sequence needed to fulfil it. 

We first applied the Uniform distribution as a control, and graphed the results in Figure 3.22. 
The observed frequencies of the Agent-sequence length mostly matched the expected frequencies, 
which was confirmed by a x 2 test; with a null hypothesis of no significant difference and sixteen 
degrees of freedom, the x 2 value was 2.588 (3 d.p.), below the critical 0.95 x 2 value of 7.962. 

We then applied the Gaussian distribution as a reasonable assumption for the behaviour of 
a large group of users, and graphed the results in Figure 3.23. The observed frequencies of 
the Agent-sequence length matched the expected frequencies with only very minor variations, 
which was confirmed by a x 2 test; with a null hypothesis of no significant difference and sixteen 
degrees of freedom, the x 2 value was 2.102 (3 d.p.), below the critical 0.95 x 2 value of 7.962. 
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Figure 3.22: Graph of Uniformly Distributed Agent- Sequence Length Frequencies: The observed 
frequencies of the Agent-sequence length mostly matched the expected frequencies, which was 
confirmed by a x 2 test; with a null hypothesis of no significant difference and sixteen degrees of 
freedom, the x 2 value was 2.588 (3 d.p.), below the critical 0.95 x 2 value of 7.962. 
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Figure 3.23: Graph of Gaussian Distributed Agent-Sequence Length Frequencies: The observed 
frequencies of the Agent-sequence length matched the expected frequencies with only very minor 
variations, which was confirmed by a x 2 test; with a null hypothesis of no significant difference 
and sixteen degrees of freedom, the x 2 value was 2.102, below the critical 0.95 x 2 value of 7.962. 
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Figure 3.24: Graph of Power Distributed Agent-Sequence Length Frequencies: The observed 
frequencies of the Agent-sequence length matched the expected frequencies with some variation, 
which was confirmed by a y 2 test; with a null hypothesis of no significant difference and sixteen 
degrees of freedom, the x 2 value was 5.048 (3 d.p.), below the critical 0.95 x 2 value of 7.962. 



Finally, we applied the Power distribution to represent a relatively extreme variation in user 
behaviour, and graphed the results in Figure 3.24. The observed frequencies of the Agent- 
sequence length matched the expected frequencies with some variation, which was confirmed by 
a x 2 test; with a null hypothesis of no significant difference and sixteen degrees of freedom, the 
X 2 value was 5.048 (3 d.p.), below the critical 0.95 x 2 value of 7.962. 



There were a couple of minor discrepancies, similar to all the experiments. First, there were 
a small number of individual Agents at the thousandth time step, caused by the typical user 
behaviour of continuously creating new services (Agents). Second, while the chi-squared tests 
confirmed that there was no significant difference between the observed and expected frequencies 
of the Agent-sequence length, there was still a bias to longer Agent-sequences (solutions). 
Evident visually in the graphs of the experiments, and evident numerically in the chi-squared 
test of the Power distribution experiment as it favoured shorter Agent-sequences. The cause of 
this bias was most likely some aspect of bloat (as we discussed in section 2.2.3.3) that was not 
fully controlled. 
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3.4.2.2 User Request Modularity 

Next, we varied the user request modularity according to the available distributions, expecting 
the sophistication of the Agents to be distributed according to the modularity of the user 
requests, i.e. the more complicated (in terms of modular non-reducible tasks) the user request, 
the more sophisticated (in terms of the number of attributes) the Agents needed to fulfil it. 

We first applied the Uniform distribution as a control, and graphed the results in Figure 3.25. 
The observed frequencies for the number of Agent attributes mostly matched the expected 
frequencies, which was confirmed by a x 2 test; with a null hypothesis of no significant difference 
and ten degrees of freedom, the x 2 value was 1.049 (3 d.p.), below the critical 0.95 x 2 value of 
3.940. 

Expected d Observed 



5 6 7 8 9 10 11 12 13 14 15 
Number of Attributes per Agent 

Figure 3.25: Graph of Uniformly Distributed Agent Attribute Frequencies: The observed 
frequencies for the number of Agent attributes mostly matched the expected frequencies, which 
was confirmed by a \ 2 test; with a null hypothesis of no significant difference and ten degrees 
of freedom, the x 2 value was 1.049 (3 d.p.), below the critical 0.95 x 2 value of 3.940. 

We then applied the Gaussian distribution as a reasonable assumption for the behaviour of a 
large group of users, and graphed the results in Figure 3.26. The observed frequencies for the 
number of Agent attributes appeared to follow the expected frequencies, but there was significant 
variation which led to a failed x 2 test ; with a null hypothesis of no significant difference and 
ten degrees of freedom, the x 2 value was 50.623 (3 d.p.), not below the critical 0.95 x 2 value of 
3.940. 
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Figure 3.26: Graph of Gaussian Distributed Agent Attribute Frequencies: The observed 
frequencies for the number of Agent attributes appeared to follow the expected frequencies, 
but there was significant variation which led to a failed x 2 test; with a null hypothesis of no 
significant difference and ten degrees of freedom. 



2000 



1500 



o 

a 

o 

& 

o 



1000 



500 



Expected 
Observed 



8 9 10 11 12 
Number of Attributes per Agent 



13 



14 



15 



Figure 3.27: Graph of Power Distributed Agent Attribute Frequencies: The observed frequencies 
for the number of Agent attributes appeared to follow the expected frequencies, but there was 
significant variation which led to a failed x 2 test; with a null hypothesis of no significant 
difference and ten degrees of freedom. 
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Finally, we applied the Power distribution to represent a relatively extreme variation in user 
behaviour, and graphed the results in Figure 3.27. The observed frequencies for the number of 
Agent attributes appeared to follow the expected frequencies, but there was significant variation 
which led to a failed x 2 test; with a null hypothesis of no significant difference and ten degrees 
of freedom, the x 2 value was 61.876 (3 d.p.), not below the critical 0.95 x 2 value of 3.940. 

In all the experiments the observed frequencies of the number of Agent attributes appeared to 
follow the expected frequencies, but this could only be confirmed statistically, by a y 2 t^st, for 
the Uniform distribution experiment. In the Gaussian and Power distribution experiments the 
X 2 tests failed by considerable margins, most likely because the evolving Agent Populations were 
still self-organising to match the user behaviour, shown by the observed frequencies approaching 
the expected frequencies, but not yet sufficiently to meet x 2 tests, because by the thousandth 
time step (user request event) each user had placed an average of only ten requests. 

3.4.3 Summary 

None of the existing definitions we considered [64, 24, 253, 95] were applicable as a definition 
for the self-organised diversity of the evolving Agent Populations. So, we further considered 
the unique properties resulting from information-centric Digital Ecosystems being the digital 
counterpart of energy-centric biological ecosystems, creating our own definition for the self- 
organised diversity of an evolving Agent Population, relative to the selection pressure provided 
by a user request. We then considered the collective self-organised diversity of the evolving 
Agent Populations relative to the global user request behaviour. Therefore, when varying 
a behavioural property of the user requests according to some distribution, we expected the 
corresponding property of the evolving Agent Populations to follow the same distribution. We 
used the Uniform distribution to provide a control, the Normal (Gaussian) distribution to 
provide a reasonable assumption for the behaviour of a large group of users, and the Power 
distribution to represent a relatively extreme distribution in user behaviour. 

We then investigated the self-organised diversity of evolving Agent Populations through 
experimental simulations. First, varying the user request length according to the different 
distributions, and testing whether the observed frequencies of the Agent-sequence length 
matched the expected frequencies, which we confirmed with successful chi-squared tests. Second, 
varying the user request modularity according to the different distributions, and testing whether 
the observed frequencies for the number of Agent attributes matched the expected frequencies, 
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again confirming with chi-squared tests. Under the Gaussian and Power distributions the chi- 
squared tests failed, most likely because the evolving Agent Populations were still self-organising 
to match the user behaviour, because at the time the Digital Ecosystem was sampled each user 
had placed an average of only ten requests. 

Collectively, the experimental results confirm that the self-organised diversity of the evolving 
Agent Populations is relative to the selection pressures of the user base, which was confirmed 
statistically for most of the experiments. So, we have determined an effective understanding and 
quantification for the self-organised diversity of the evolving Agent Populations of our Digital 
Ecosystem. While the minor experimental failures, in which the Digital Ecosystem responded 
more slowly than in the other experiments, have shown that there is potential to optimise the 
Digital Ecosystem, because the evolutionary self-organisation of an ecosystem is a slow process 
[29], even the accelerated form present in our Digital Ecosystem. 



3.5 Summary and Discussion 



We have investigated the self-organising behaviour of Digital Ecosystems, because a primary 
motivation for our research is the desire to exploit the self-organising properties of biological 
ecosystems [173], which are thought to be robust, scalable architectures that can automatically 
solve complex, dynamic problems. Over time a biological ecosystem becomes increasingly 
self-organised through the process of ecological succession [29], driven by the evolutionary self- 
organisation of the populations within the ecosystem. Analogously, a Digital Ecosystem's 
increasing self-organisation comes from the Agent Populations being evolved to meet the 
dynamic selection pressures created by requests from the user base. The self-organisation of 
biological ecosystems is often defined in terms of the complexity, stability, and diversity [150], 
which we also applied in defining the self-organisation of our Digital Ecosystems. We started 
by discussing the relevant literature, including the philosophical meaning of organisation and 
of self, learning that self-organisation is context dependent, and that a system is only self- 
organising if the process or force causing the organisation is within its boundaries. So, we 
compared and contrasted alternative definitions [64, 24, 253, 48, 8, 17, 53, 95] for the self- 
organised complexity, stability, and diversity of the evolving Agent Populations, examining their 
suitability and application, because possessing properties of computing systems (e.g. agent 
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systems) as well as biological systems (e.g. population dynamics), the combination of these 
properties makes the evolving Agent Populations unique. 

None of the existing definitions we considered [64, 24, 253, 48, 8] were directly applicable as a 
definition for the self-organised complexity of evolving Agent Populations, but the properties 
of Physical Complexity [8] closely matched our intuitive understanding, and so was chosen 
for further investigation. Based upon information theory and entropy, it provides a measure 
of the quantity of information in the genome of a population, relative to the environment in 
which it evolves, by calculating the entropy in the population to determine the randomness 
in the genome [8]. Reformulating Physical Complexity for an evolving Agent Population 
required consideration of the following issues: the mapping of the sequence sites to the 
Agent-sequences, and the managing of populations of variable length sequences. We then 
built upon this to construct a variant of the Physical Complexity called the Efficiency, 
because it was based on the efficiency of information storage in Physical Complexity, which 
we then used to develop an understanding of clustering and atomicity within evolving Agent 
Populations. Collectively, the experimental results confirm that Physical Complexity has been 
successfully extended to evolving Agent Populations. Most significantly, Physical Complexity 
has been reformulated algebraically for populations of variable length sequences, which we have 
confirmed experimentally through simulations. Our Efficiency definition provides a universally 
applicable macroscopic value to characterise the complexity of a population, independent 
of clustering, atomicity, length (variable or same), and size. So, we have determined an 
effective understanding and quantification for the self-organised complexity of the evolving 
Agent Populations of our Digital Ecosystem. The understanding and techniques we have 
developed have applicability beyond evolving Agent Populations, as wide as the original 
Physical Complexity, which has been applied from DNA [7] to simulations of self-replicating 
programmes [172]. 

None of the existing definitions we considered [64, 253, 17, 17, 53] were directly applicable as 
a definition for the self-organised stability of evolving Agent Populations, but the properties of 
Chli-DeWilde stability [53] closely matched our intuitive understanding, and so was chosen for 
further investigation. It views a Multi-Agent System as a discrete time Markov chain (with 
potentially unknown transition probabilities) that is considered to be stable when its state, 
a stochastic process, has converged to an equilibrium distribution. Extending Chli-DeWilde 
stability to the Multi-Agent System of an evolving Agent Population required consideration 
of the following issues: the inclusion of population dynamics, and an understanding of 
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population macro- states. We then built upon this to construct an entropy-based definition 
for the degree of instability (entropy of the limit probabilities), which was used to perform 
a stability analysis (similar to a sensitivity analysis [43]) of an evolving Agent Population. 
Collectively, the experimental results confirm that Chli-Dewilde stability has been successfully 
extended to evolving Agent Populations, while our definition for the degree of instability 
provides a macroscopic value to characterise the level of stability. So, we have determined 
an effective understanding and quantification for the self-organised stability of the evolving 
Agent Populations of our Digital Ecosystem. Also, our extended Chli-DeWilde stability is 
applicable to other Multi-Agent Systems with evolutionary dynamics. Furthermore, our degree 
of instability is applicable to all Multi- Agent Systems, with or without evolutionary dynamics. 

None of the existing definitions we considered [64, 24, 253, 95] were applicable as a definition for 
the self-organised diversity of evolving Agent Populations. So, we further considered the unique 
properties resulting from information-centric Digital Ecosystems being the digital counterpart of 
energy-centric biological ecosystems, creating our own definition for the self-organised diversity 
of an evolving Agent Population relative to the selection pressure provided by a user request. We 
then considered the collective self-organised diversity of the evolving Agent Populations relative 
to the global user request behaviour. Therefore, when varying a behavioural property of the 
user requests according to some distribution, we expected the corresponding property of the 
evolving Agent Populations to follow the same distribution. We used the Uniform distribution 
to provide a control, the Normal (Gaussian) distribution to provide a reasonable assumption 
for the behaviour of a large group of users, and the Power distribution to provide a relatively 
extreme distribution in user behaviour. Collectively, the experimental results confirm that the 
self-organised diversity of the evolving Agent Populations is relative to the selection pressures 
of the user base, which was confirmed statistically for most of the experiments. So, we have 
determined an effective understanding and quantification for the self-organised diversity of the 
evolving Agent Populations of our Digital Ecosystem. While the minor experimental failures, 
in which the Digital Ecosystem responded more slowly than in the other experiments, have 
shown that there is potential to optimise the Digital Ecosystem, because the evolutionary self- 
organisation of an ecosystem is a slow process [29], even the accelerated form present in our 
Digital Ecosystem. 

Overall an insight has been achieved into where and how self-organisation occurs in our Digital 
Ecosystem, and what forms this self-organisation can take and how it can be quantified. The 
hybrid nature of the Digital Ecosystem resulted in the most suitable definition for the self- 
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organised complexity coming from the biological sciences, while the most suitable definition 
for the self-organised stability coming from the computer sciences. However, we were unable 
to use any existing definition for the self-organised diversity, because the hybrid nature of the 
Digital Ecosystem makes it unique, and so we constructed our own definition based on variation 
relative to the user base. The (Physical) complexity definition applies to a single point in time 
of the evolving Agent Populations, whereas the (Chli-DeWilde) stability definition applies at 
the end of these instantiated evolutionary processes, while our diversity definition applies to 
the optimality of the distribution of the Agents within the evolving Agent Populations of the 
Digital Ecosystem. The experimental results have generally supported the hypotheses, and have 
provided more detail to the behaviour of the self-organising phenomena under investigation, 
showing some of its properties and for the self-organised diversity has shown that there is 
potential for optimising the Digital Ecosystem. 

In this chapter we have investigated the emergent self-organising properties of Digital 
Ecosystems, and with the greater and more in-depth understanding we have developed and 
gained of the order constructing processes (the evolving Agent Populations), including a 
clearer identification of the potential areas and scopes for augmentation, we will attempt 
the optimisation of Digital Ecosystems in the following chapter, Chapter 4, for which the 
results here have confirmed the potential for optimisation identified in the previous chapter, 
Chapter 2. 



Chapter 4 



Optimisation of Digital Ecosystems 



In this chapter we attempt the acceleration and optimisation of Digital Ecosystems, because the 
evolutionary self-organisation of ecological succession (the formation of a mature ecosystem) is a 
slow process, even the accelerated form present in our Digital Ecosystem. First, we consider the 
scope for optimisation identified in the previous chapters, and the potential for augmentations 
from the biological sciences. Consolidating this understanding we propose, construct and 
explore alternative augmentations to accelerate or optimise the evolutionary and ecological self- 
organising dynamics of our Digital Ecosystems. The most promising, the clustering catalyst and 
the targeted migration, were completed theoretically, before being investigated experimentally 
to determine their improvement on the evolutionary and ecological dynamics in responding 
to the needs of the user base; the first aiming to optimise the evolutionary dynamics, while 
the second aiming to optimise the ecological dynamics. First, the clustering catalyst operates 
upon an evolutionary process, encouraging intra-cluster crossover to accelerate reaching the 
optimal solution, directly accelerating a core operation of the Digital Ecosystem. A suitable 
existing clustering algorithm and a Physical Complexity based one were both evaluated for 
the required clustering. Second, the targeted migration operates on the ecological dynamics, 
allowing the Agents to interact for additional highly targeted migration, indirectly optimising a 
global operation of the Digital Ecosystem. Both Neural Networks and Support Vector Machines 
were evaluated for the required pattern recognition and learning functionality. We conclude 
with a summary and discussion of the achievements, including the experimental results. 
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4.1 Background Theory 



We proposed that an ecosystem inspired approach would be more effective at greater scales than 
traditionally inspired approaches, because it would be built upon the scalable and self-organising 
properties of biological ecosystems [173]. So, a Digital Ecosystem, being the digital counterpart 
of biological ecosystems, possesses their scalable self-organising behaviour, properties and 
processes. However, the self-organising process of ecological succession is a slow one, the orderly 
and predictable changes in the composition and structure of an ecological community in forming 
a mature ecosystem [29] , even the accelerated form present in our Digital Ecosystem. Therefore, 
it may be possible to accelerate and optimise this equivalent process of our Digital Ecosystem. 

The scope for optimisation and acceleration was identified and confirmed in the previous 
chapters. First, identified by the results of Chapter 2, specifically the ecological succession 
experiment (section 2.3.6) in which the Digital Ecosystem reached only 70% responsiveness, 
clearly showing potential for improvement. Second, confirmed by the results of Chapter 3, 
specifically two of the modularity scenarios of the self-organised diversity experiment (section 
3.4.2.2), in which the Digital Ecosystem responded more slowly in these scenarios than others, 
confirming the potential for improvement. Therefore, there is scope for optimising and 
accelerating the equivalent process of ecological succession in our Digital Ecosystem. 

In biological ecosystems the trajectory of ecological change can be influenced by site conditions, 
by the interactions of the species present, and by more stochastic factors such as the availability 
of colonists or seeds, or weather conditions at the time of disturbance [312]. So, ecological 
optimisation is generally concerned with the maintenance of diversity and stability, for the 
survival of populations, species, habitats, etc [323, 3, 224], and ecological acceleration is 
similarly concerned with the re-establishment of diversity and stability, through optimal species 
selection and promotion [201, 181, 336]. Therefore, biological ecosystems research has no focus 
on the type of optimisation or acceleration we require, which is unsurprising, because one of 
the fundamental differences between biological and digital ecosystems lie in the motivation 
and approach of their researchers; given that biological ecosystems are ubiquitous natural 
phenomena whose maintenance is crucial to our survival [20], whereas Digital Ecosystems 
are a technology engineered to serve specific human purposes. So, we are unlikely to find 
augmentations from biological ecosystems to optimise our Digital Ecosystem. 



4.2. Alternative Augmentations 



143 



The optimisation of Digital Ecosystems sought is not that of parameter optimisation, which 
is achievable through exploratory programming [294], but an augmentation to the Ecosystem- 
Oriented Architecture that provides a significant improvement in performance, i.e. better 
solutions for the users than the Digital Ecosystem alone could achieve. In the previous 
chapter we have investigated the emergent self-organising properties of Digital Ecosystems, 
and with the greater and more in-depth understanding we have developed and gained of the 
order constructing processes (the evolving Agent Populations), including a clearer identification 
of the potential areas and scopes for augmentation, we will now propose, construct and 
explore alternative augmentations to accelerate or optimise the evolutionary and ecological 
self-organising dynamics of our Digital Ecosystem. The most promising will be completed 
theoretically and then investigated experimentally through simulations. 



4.2 Alternative Augmentations 



Any proposed augmentation should improve the process of ecological succession [29] for our 
Digital Ecosystem. So, based on the understanding and results from the previous chapters, 
our general knowledge, and our intuition, we will now propose, construct, and explore possible 
alternative augmentations for our Digital Ecosystem that fulfils this requirement. 



4.2.1 Clustering Catalyst 



A significant proportion of user requests will be returned multiple optimal responses (applica- 
tions), by evolving Agent Populations consisting of clusters as defined in the previous chapter. 
So, potential exists to accelerate these evolving Agent Populations with clusters by a clustering 
catalyst, which would encourage intra-cluster crossover, reducing the number of generations 
required for the clusters to reach their respective optimal genomes (applications), therefore 
directly accelerating the evolutionary self-organisation in determining applications (Agent- 
sequences) to user requests. So, accelerating the responsiveness of the Digital Ecosystem to the 
user base, and the process of ecological succession [29]. 

Crossover involves the crossing of two Agent-sequences, leading to recombination in the creation 
of new Agent-sequences, during the replication stage of the evolutionary cycle [13]. This 
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augmentation would encourage inter-cluster crossover within the evolving Agent Populations, 
with the aim of directly accelerating them in to find the optimal Agent-sequence (s) in fewer 
generations. As each evolving Agent Population within the Digital Ecosystem would be 
accelerated, the entire ecosystem would operate more efficiently 




Figure 4.1: Clustering Catalyst: This would encourage intra-cluster crossover, reducing the 
number of generations required for the clusters to reach their respective optimal genomes 
(applications), therefore directly accelerating the evolutionary self-organisation in determining 
applications (Agent-sequences) to user requests. So, accelerating the responsiveness of the 
Digital Ecosystem to the user base, and the process of ecological succession [29]. 

This augmentation has considerable potential to optimise the evolving Agent Populations of 
Digital Ecosystems, but to be effective the determination of clusters needs to be computationally 
negligible, otherwise the overall effect would be counterproductive. While evolving Agent 
Populations would find the optimal application (Agent-sequence) within fewer generations, 
more time overall would be required. 

Our work on clustering with Physical Complexity, from section 3.2.2.3, may prove useful for 
this augmentation. 
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4.2.2 Replacement Aggregator 

Evolutionary computing [90] was chosen exclusively for the aggregation [combinatorial optimi- 
sation [240]) of the Agents into optimal Agent-sequences (applications), without comparison to 
other techniques, because are focus was creating the digital counterpart of biological ecosystems. 
If we were to assume it might not be the optimal technique, we could consider a replacement 
aggregator to perform the aggregation of the Agents with an alternative technique, potentially 
accelerating the responsiveness of the Digital Ecosystem to the user base, and the process of 
ecological succession [29]. 




Figure 4.2: Replacement Aggregator: This would work by treating the evolving Agent 
Population, the embodiment of evolutionary computing in Digital Ecosystems, as an 
interchangeable module, and considering a replacement aggregator to perform the aggregation 
of the Agents with an alternative technique, potentially accelerating the responsiveness of the 
Digital Ecosystem to the user base, and the process of ecological succession [29]. 

This augmentation would work by treating the evolving Agent Population, the embodiment of 
evolutionary computing in Digital Ecosystems, as an interchangeable module. It also assumes 
a more effective aggregator can be found to perform the combinatorial optimisation [240] that 
occurs in response to a user request, on the set of Agents and Agent-sequences available from 
the Agent-pool of a Habitat. As each Agent aggregation process within the Digital Ecosystem 
would be accelerated, the entire ecosystem would operate more efficiently 
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This augmentation could even allow for a range of available aggregators, choosing the most 
effective depending on the user, or on a case-by-case basis. However, replacing the evolutionary 
mechanism of the Digital Ecosystem with an alternative technique would weaken its Ecosystem- 
Oriented Architecture; potentially risking the loss of valuable behaviour, such as emergent self- 
organisation, scalability and sustainability, imbibed from creating the digital counterpart of 
biological ecosystems. So, while the modular nature of the Ecosystem-Oriented Architecture 
of Digital Ecosystems makes this augmentation possible, it would not be prudent. 



4.2.3 Agent-Pool Aggregation 



The appealing vision of the Agent-pool aggregation is that of the Agents intelligently 
recombining with one another, joining and leaving Agent-sequences of their own accord to 
improve the responsiveness of the Digital Ecosystem, allowing for the creation of potentially 
useful applications (Agent-sequences) or partial applications inside the Agent-pools, increasing 




Figure 4.3: Agent-Pool Aggregation: This augmentation allows for the creation of potentially 
useful applications (Agent-sequences) or partial applications inside the Agent-pools, increasing 
and optimising the recombination that occurs globally within the Digital Ecosystem. So, it would 
help to optimise the Agent-sequences at the Agent-pools of the Habitats, which would in turn 
optimise the evolving Agent Populations, as they make use of the Agent-pools. 
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and optimising the recombination that occurs globally within the Digital Ecosystem. So, 
it would help to optimise the Agent-sequences at the Agent-pools of the Habitats, which 
would in turn optimise the evolving Agent Populations, as they make use of the Agent-pools 
when determining applications (Agent-sequences) to user requests. Therefore, accelerating the 
process of ecological succession [29] , and so the responsiveness of the Digital Ecosystem to the 
user base. 

This augmentation would work by providing the Agents with the opportunity to interact inside 
the Agent-pools and the ability to determine whether to recombine with one another, outside the 
evolutionary optimisation of the evolving Agent Populations. For the Agents to judge potential 
re-combinations, they will require an understanding of the context in which they would operate, 
most importantly the past user requests of the Habitat where the recombination would occur. 
So, this augmentation would optimise the set of Agents and Agent-sequences at the Agent- 
pools, and therefore indirectly optimise and accelerate the evolving Agent Populations within 
the Digital Ecosystem. As each evolving Agent Population within the Digital Ecosystem would 
be accelerated, the entire ecosystem would operate more efficiently. 

This augmentation would strengthen the Agent concept within the Ecosystem-Oriented 
Architecture of Digital Ecosystems, endowing the individual Agents with some intelligence 
and control over their behaviour. However, a sophisticated process would be required for the 
Agents to evaluate a potential recombination, considering their descriptions with other Agents' 
descriptions collectively, within the context of the past user requests handled by the Habitat 
where the recombination is to occur. Additionally, a scalable mechanism would be required 
to determine which re-combinations the Agents should evaluate, because generally it will be 
impractical to evaluate all the re-combinations possible at any one time. Interestingly, the 
effectiveness of this augmentation relies on the local interactions of the Agents, producing an 
emergent global optimising effect on the evolving Agent Populations to accelerate the ecological 
succession of a Digital Ecosystem. 

4.2.4 Targeted Migration 

The self- organised diversity experiments from the previous chapter, showed that the Digital 
Ecosystem can be slow to optimally distribute the Agents, within the Habitat network, relative 
to the user request behaviour. So, potential exists to optimise the distribution of the Agents 
within the Habitat network, through additional targeted migration of the Agents, which would 
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Figure 4.4: Targeted Migration: This augmentation would optimise the distribution of the 
Agents within the Habitat network, through additional targeted migration of the Agents, helping 
to optimise the Agents found at the Agent-pools of the Habitats, which would in turn optimise 
the evolving Agent Populations. So, accelerating the process of ecological succession [29], and 
therefore the responsiveness of the Digital Ecosystem to the user base. 

indirectly optimise the evolving Agent Populations. The migration probabilities between the 
Habitats produces the existing passive Agent migration, allowing the Agents to spread in the 
correct general direction within the Habitat network, based primarily upon success at their 
current location. This augmentation will work in a more active manner, allowing the Agents 
highly targeted migration to specific Habitats, in addition to the generally directed passive 
migration. It will help to optimise the Agents found at the Agent-pools of the Habitats, which 
would in turn optimise the evolving Agent Populations as they make use of the Agent-pools 
when determining applications (Agent-sequences) to user requests. So, accelerating the process 
of ecological succession [29], and therefore the responsiveness of the Digital Ecosystem to the 
user base. 

This augmentation would work by providing the Agents with the opportunity to interact inside 
the Agent-pools, outside of the evolutionary optimisation of the evolving Agent Populations, to 
determine if they are functionally similar based on their semantic descriptions. Similar Agents 
will compare their migration histories to determine Habitats where they could find a niche (i.e. 
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be valuable). This would lead to additional highly targeted migration of Agents throughout 
the Habitat network, optimising the set of Agents and Agent- sequences at the Agent-pools, 
and therefore indirectly optimising and accelerating the evolving Agent Populations within the 
Digital Ecosystem. As each evolving Agent Population within the Digital Ecosystem would be 
accelerated, the entire ecosystem would operate more efficiently 

This augmentation would strengthen the Agent concept within the Ecosystem-Oriented 
Architecture of Digital Ecosystems, endowing the individual Agents with some intelligence 
and control over their behaviour. Interestingly, the effectiveness of this augmentation relies 
on the local interactions of the Agents, producing an emergent global optimising effect on the 
evolving Agent Populations to accelerate the ecological succession of a Digital Ecosystem. 



4.2.5 Choice of Augmentation 
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Figure 4.5: Effect of The Proposed Augmentations: The effect of the different augmentations 
on the evolutionary dynamics (the evolving Agent Populations) and the ecological dynamics (the 
Habitats). This separation of concerns is an artificial construct, but useful in summarising the 
potential of the different augmentations, before we decide upon which to pursue. 



The question of which of the alternative augmentations are most promising, and therefore 
which we should pursue to theoretical completion and then experimental confirmation, is not 
obvious. The evolving Agent Populations are the embodiment of evolutionary computing [90] 
in Digital Ecosystems, while the Habitats are the embodiment of the ecology-based computing 
we have developed for Digital Ecosystems. So, we will start by considering the effect of the 
different augmentations on the evolutionary dynamics (the evolving Agent Populations) and 
the ecological dynamics (the Habitats), as shown in Figure 4.5. This separation of concerns is 
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an artificial construct, but useful in summarising the potential of the different augmentations, 
before we decide upon which to pursue. 

The clustering catalyst has potential to accelerate evolving Agent Populations with clusters, 
and therefore the process of ecological succession [29]. However, a computationally negligible 
technique is needed to determine the required clustering, else the overall effect will be 
counterproductive. Nevertheless, we will pursue this augmentation further; first to theoretical 
completion, and then to experimental simulations for confirmation. 

The replacement aggregator could prove effective for using a range of techniques when finding 
the optimal aggregation of the Agents into Agent-sequences in response to user requests. 
However, it would weaken the Ecosystem-Oriented Architecture of Digital Ecosystems and 
presume more effective techniques than evolutionary computing [90] can be found for the 
combinatorial optimisation [240] of the Agent aggregation. The first point would obviously 
be undesirable, potentially risking the loss of valuable behaviour, such as emergent self- 
organisation, scalability and sustainability, imbibed from creating the digital counterpart 
of biological ecosystems. Regarding the second point, it has been shown [270] that when 
considering the combinatorial optimisation of Agent aggregation as the weighted set-cover 
problem that evolutionary computing and simulated annealing are more effective than steepest 
descent, Tabu search, and random search. Also that evolutionary computing is more widely 
applicable (without performance degradation) than simulated annealing [270]. So, while there 
may be other applicable techniques that have not been evaluated, the certainty of success with 
this augmentation is considerably reduced, and therefore will not be pursued further. 

The Agent-pool aggregation could prove very effective in optimising and accelerating the process 
of ecological succession [29], but the computational cost would be considerable, most likely 
making it impractical. So, while we are optimistic regarding the potential success of this 
augmentation theoretically, the experimental impracticality leads us not to pursue it any 
further. 

The targeted migration also has considerable potential to optimise and accelerate the process of 
ecological succession [29] , by improving the migration of Agents through the Habitat network 
of the Digital Ecosystem. Also, it would directly address the scope for optimisation identified 
from the self-organised diversity experiment of section 3.4.2.2. Furthermore, it is the only 
augmentation to effect the ecological dynamics directly, which makes it desirable to pursue, 
because while evolution may be well understood in computer science under the auspices of 
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evolutionary computing [90], ecology until our efforts had not been widely explored. So, 
there is inherently more potential to improve the ecological dynamics than the evolutionary 
dynamics, and therefore more potential in this augmentation than the others. However, creating 
a mechanism for the Agents to determine if they are functionally similar with one another 
based on their semantic descriptions will be a challenge. Nevertheless, we will pursue this 
augmentation further; first to theoretical completion, and then to experimental simulations for 
confirmation. 

4.3 Clustering Catalyst 

The clustering catalyst will directly optimise the evolutionary self-organisation of evolving 
Agent Populations with clusters, by encouraging intra-cluster crossover. Crossover involves the 
crossing of two Agent-sequences in the creation of new Agent-sequences, during the replication 
stage of the evolutionary cycle [13]. Theoretical completion of the clustering catalyst requires 
consideration of how best to determine the clusters within an evolving Agent Population, so 
we will now consider suitable clustering algorithms. 

4.3.1 Clustering 

We have understood clustering, within the context of evolving Agent Populations, as the 
amassing of same or similar sequences around an optimum genome [29], but more generally 
clustering is the classification of objects into different groups, or more precisely, the partitioning 
of a data set into subsets (clusters), so that the data in each subset share some common trait, 
often proximity according to a distance measure [139]. If the number of clusters k is not 
apparent from prior knowledge, several methods are available for its determination [210]. For 
our simulations we will make use of prior knowledge to determine the number of clusters k, 
because if available it is obviously the most effective method, and because our focus is on 
determining the effectiveness of the clustering catalyst. Naturally, the most suitable method 
for determining the number of clusters can be investigated if the clustering catalyst proves to 
be effective. 

An important step in any clustering is to select a distance measure, which calculates the 
similarity of two elements [139]. We will use a distance measure based on our simulated fitness 
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function from section 2.3.3, because it is itself based on a distance metric. So, given two Agent- 
sequences A and B, consisting of a set of attributes ai,a2,... and 61,62, ••■ respectively, the 
distance between them will be 



where a is the member of A such that the difference to the required attribute b is minimised. 

A range of clustering algorithms are available, to the extent that taxonomies having been 
proposed [113, 142] for their classification. The top-level classification being between 
hierarchical and partitional algorithms, both of which we shall now explore. 

4.3.1.1 Hierarchical Clustering 

Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters 
[142]. The traditional representation of this hierarchy is a tree (called a dendrogram), with 
individual elements at one end and a single cluster containing every element at the other [139]. 
Agglomerative algorithms begin at the leaves of the tree, whereas divisive algorithms begin at 
the root [142]. 

Agglomerative clustering starts with all the objects as individual clusters, which are then 
merged according to their similarities until all are fused into a single cluster, with the most 
similar objects being grouped first [139]. The similarity criterion is determined by the linkage 
analysis, for which there are three common forms [239]: 

• Single-link (nearest neighbour or minimum distance) [291]: is obtained by fusing clusters 
according to the distance between their nearest members. 

• Complete-link (farthest neighbour or maximum distance) [151]: is obtained by fusing 
clusters according to the distance between their farthest members. 

• Average-link (average distance) [239]: is obtained by fusing clusters according to the 
average distance between pairs of members in the respective sets. 

Most hierarchical agglomerative clustering algorithms are variants of the single-link, complete- 
link and average-link algorithms [142]. Most notably, the minimum- variance algorithm [324], 
which is a variant of the average-link algorithm, except instead of minimising an average 
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distance it minimises a squared distance weighted by cluster size [220]. The single-link and 
complete-link algorithms are the most popular [142]. However, the single-link algorithm suffers 
from a chaining effect [225], which tends to produce clusters that are straggly or elongated 
[142]. In contrast, the complete-link algorithm produces tightly bound or compact clusters 
[15]. While the average-link algorithm [239] is designed to reduce the dependence of the cluster- 
linkage criterion on extreme values, such as the most similar or dissimilar of the single-link and 
complete-link algorithms [239], and results in clusters that tend to have approximately equal 
within-cluster variability [167]. The minimum- variance algorithm tends to join clusters with a 
small number of observations first, being strongly biased to producing clusters with the same 
number of observations, and therefore is very sensitive to outliers [211]. 

Divisive clustering methods start with one cluster containing all the objects, which are 
successively separated into smaller subgroups until the number of clusters equals the number of 
objects [142]. There are two forms: monothetic, which divides the data by the possession of a 
single specified attribute, and polythetic, where divisions are based on several attributes [142]. 
Agglomerative algorithms make clustering decisions based on local patterns without initially 
considering the global distribution, and these early decisions cannot be undone; while divisive 
clustering benefits from complete information about the global distribution when making top- 
level partitioning decisions [190]. It also has the advantage of being more efficient if we do 
not generate a complete hierarchy all the way down to the individual objects [190]. However, 
divisive clustering is conceptually more complex than agglomerative clustering, since a second 
flat clustering algorithm is required as a subroutine [190] . There are also graph-theoretic divisive 
clustering algorithms, with the best-known based on the construction of the minimal spanning 
tree of the data, deleting the edges with the largest lengths to generate the clusters [340] . 

4.3.1.2 Partitional Clustering 

A partitional clustering algorithm determines a single partition of the data [142], instead of 
a clustering structure such as the dendrogram produced by a hierarchical algorithm [139]. 
Partitional algorithms usually produce clusters by optimising a criterion function defined either 
locally (on a subset of the patterns) or globally (defined over all the patterns) [142]. A 
combinatorial search for the set of possible labellings, to determine the optimum value of a 
criterion, is clearly computationally prohibitive, and so in practise the algorithm is run multiple 
times with different starting states, with the best configuration being used as the output of 
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the clustering [142]. Partitional algorithms have advantages in applications involving large 
data sets for which the construction of a dendrogram is computationally prohibitive [142]. A 
problem accompanying partitional algorithms is choosing the number of desired output clusters 
beforehand [84], which is not required for hierarchical clustering. There are many forms of 
partitional clustering algorithms, including mixture-resolving [139], mode-seeking [139], nearest 
neighbour [180], fuzzy clustering [339], artificial neural networks [280], and others [142]. 

The most intuitive and frequently used criterion function in partitional clustering algorithms 
is the squared error criterion, which tends to work well with isolated and compact clusters 
[142]. The most commonly used algorithm employing a squared error criterion is the k- means 
algorithm [187], popular because it is easy to implement [147]. It starts with a random initial 
partition and keeps reassigning the patterns to clusters, based on the similarity between a 
pattern and the cluster centres, until a convergence criterion is met [187]. However, a major 
problem with this algorithm is that it is sensitive to the selection of the initial partition, and 
may converge to a local optimum of the criterion function value if the initial partition is not 
properly chosen [105]. 



4.3.1.3 Choice of Algorithm 



The choice of the optimal clustering algorithm very much depends on the structure of the data, 
because clustering is subjective, such that the same data can be partitioned differently for 
different purposes [142] . We also require a clustering algorithm with a negligible computational 
cost for the clustering catalyst to be effective. So, we choose a hierarchical clustering algorithm 
over a partitional one, as it is more appropriate for small data sets [142], such as expected 
from an evolving Agent Population with clusters. We choose a hierarchical agglomerative 
clustering algorithm over a divisive one, because it is conceptually simpler [190], and because 
the efficiency advantage [190] of a divisive algorithm would be negligible, given the small size of 
the expected data set. Finally, we choose a hierarchical agglomerative average-link clustering 
algorithm of time complexity 0(n 2 log n) [190], over a single-link one of 0(n 2 ) time complexity, 
a complete-link one of 0{n 2 log n) time complexity, or a minimum-variance one of 0{n 2 ) time 
complexity [71], because the average-link algorithm is designed to reduce the dependence of 
the cluster-linkage criterion on extreme values, such as the most similar or dissimilar of the 
single-link and complete-link algorithms [239] ; and because the minimum- variance algorithm is 
biased to producing clusters with the same number of objects [211], which would be problematic 
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for clusters emerging over the generations. Any difference in execution time of the algorithms 
would be minimal, despite the different algorithmic time complexities, because of the small size 
of the expected data set n, the Population size. So, we choose a hierarchical agglomerative 
average-link clustering algorithm [239] for our clustering catalyst. 



4.3.2 Physical Complexity Clustering 

We also considered a clustering algorithm based on our extended Physical Complexity from 
Chapter 3, because clustering is subjective in nature [142], and our extended Physical 
Complexity was developed, in section 3.2.2.3, to understand the clustering of evolving Agent 
Populations. In our algorithm, with the number of clusters k determined also from prior 
knowledge, we first sort the evolving Agent Population, then process its Agent-sequences 
linearly, adding each to the cluster that maximises the Efficiency E c (3.42) of the Population. 
The pre-assignment sorting ensures that the cores of the clusters are established for the 
Efficiency E c to have the necessary sensitivity when assigning the Agent-sequences of greater 
uniqueness. The pseudocode for our algorithm is shown in Figure 4.6. 



clusters [k]; 

group duplicate Agent-sequences within the Population 
order the groups within Population by greatest size 
for each Agent-sequence in the ordered Population 
if a duplicate and not first instance 

then: assign to same cluster as first instance 

else: the cluster that maximises the Efficiency E c of the Population 
end if 
end for 

Figure 4.6: Pseudo-Code for Physical Complexity Clustering: With the number of clusters k 
determined from prior knowledge, we first sort the evolving Agent Population, then process its 
Agent-sequences linearly, adding each to the cluster that maximises the Efficiency E c (3.42) 
of the Population. The pre-assignment sorting is for the Efficiency E c to have the necessary 
sensitivity when assigning the Agent-sequences of greater uniqueness. 

Our algorithm will be computationally negligible, because, based on the pseudo-code, it will 
have a time complexity of 0(n 2 ), where n is the Population size. It will also be as accurate 
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as an exhaustive search, which would have and exponential time complexity, because of the 
sensitivity of the Efficiency E c when assigning the Agent-sequences to the clusters. 

Now that the clustering catalyst is theoretically complete, with two alternative clustering 
algorithms, we can confirm its effect experimentally through simulations. 

4.4 Targeted Migration 

The targeted migration will directly optimise the ecological migration, and therefore indirectly 
complement the evolutionary self-organisation of the evolving Agent Populations, through the 
highly targeted migration of the Agents to their niche Habitats. The migration probabilities 
between the Habitats produces the existing passive Agent migration, allowing the Agents to 
spread in the correct general direction within the Habitat network, based primarily upon success 
at their current location. The targeted migration will work in a more active manner, allowing the 
Agents highly targeted migration to specific Habitats, based upon their interaction with one 
another to discover Habitats where they could be valuable (i.e. find a niche). Theoretical 
completion of the targeted migration requires further consideration of how it will operate, 
including its effect on the Agent life-cycle, and a suitable pattern recognition [140] technique 
for the required similarity recognition. 

The targeted migration will occur when users deploy their services, specifically when deploying 
their representative Agents to their Habitats within the Digital Ecosystem, and upon the 
execution of applications (groups of services), specifically the resulting passive migration of 
their representative Agent-sequences between the Habitats. The Agent-sequences arriving at 
Habitats, with respect to the targeted migration, will be treated as individual Agents arriving 
at the Habitats. So, an Agent arriving at a Habitat interacts one-on-one with Agents already 
present within the Agent-pool of the Habitat, and upon determining functional similarity, 
based upon comparing their semantic descriptions, will share other Habitats successfully visited 
from their respective migration histories. An Agent migration history, as defined in section 
2.2.2.1, is the migratory path of the Agent through the Habitat network, including its use at 
the Habitats visited. So, similar Agents can share their migration histories to discover new 
Habitats where they could be valuable, and then use targeted migration (via a copy, and not 
a move) to explore the most promising of the recently acquired Habitats. This will allow 
successfully interacting Agents to target specific Habitats where they will potentially be useful, 
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Agent-pool of Habitat 

Agent migration y 



user request 




execute 
Agent-sequence 
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for solution (Agent-sequence) Agent-pool of Habitat 

Figure 4.7: Agent Life- Cycle With Targeted Migration: The Agent life-cycle, defined in section 
2.2.4-3, will change to support the targeted migration, as shown by the blue circle. Specifically, 
there will be more opportunities for Agent migration, but more importantly these opportunities 
will be for targeted migration, which will help to optimise the set of Agents found at the 
Habitats. 

but risks potentially infinite targeted migration, because targeted migration itself can lead to 
further targeted migration. So, each Agent will require a dynamic targeted migrations counter, 
which defines the number of permitted targeted migrations of the Agent. This counter will be 
incremented upon an Agent's execution in response to a user request, and decremented upon 
performing a targeted migration. 

The Agent life-cycle, defined in section 2.2.4.3, will change to support the targeted migration, 
as shown in Figure 4.7 by the blue circle. Specifically, there will be more opportunities for 
Agent migration, but more importantly these opportunities will be for targeted migration, 
which will help to optimise the set of Agents found at the Habitats, and therefore support the 
evolving Agent Populations created in response to user requests for applications. The targeted 
migration will essentially short-circuit the hierarchical topology of the Habitat network, which 
is what allows it to specialise and localise to communities, providing specific solutions to specific 
requests from specific users. However, the targeted migration will also reinforce the hierarchical 
topology of the Habitat network, because targeted migration between connected Habitats will 
accelerate the existing migration of Agents, while between unconnected Habitats will assist 
the Digital Ecosystem in supporting emerging communities. So, the targeted migration will 
help strengthen and catalyse the formation of clusters within the Habitat network, and will 
also assist in locating Habitats within the correct clusters. Therefore, the optimisation of the 
Digital Ecosystem will be a global emergent effect resulting from the local interactions of the 
Agents, allowing for niches to be fulfilled faster and so accelerating the process of ecological 
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succession [29]. Also, the Digital Ecosystem will adapt faster to changing environmental 
conditions (e.g. changes in the request behaviour of user communities). In biological terms 
the targeted migration endows the Agents with a form of reciprocal altruistic behaviour [311], 
consistent with the Agent paradigm of the Ecosystem- Oriented Architecture. 

4.4.1 Similarity Recognition 

For the targeted migration to work successfully an effective technique will be required for the 
similarity recognition between the semantic descriptions of two Agents. Each Agent will have an 
embedded similarity recognition component to maintain the consistency of the Agent paradigm 
of Ecosystem-Oriented Architectures. So, the Agents will interact one-on-one to determine 
functional similarity based upon their semantic descriptions, using their embedded similarity 
recognition components, with each of the two interacting Agents determining similarity for 
themselves. Again, this is to maintain the consistency of the Agent paradigm. Similarity 
recognition between the semantic descriptions of two Agents will require some form of pattern 
recognition, because there is no single standard for the semantic description of services [42], 
and adopting one over the others would be inconsistent with the inclusive nature of Digital 
Ecosystems. So, we will now consider the field of pattern recognition to determine suitable 
techniques for the similarity recognition components to be embedded within the Agents. 

Pattern recognition aims to classify data (patterns) based on priori knowledge or on statistical 
information extracted from the data [262]. Pattern recognition requires a sensor or sensors 
for data acquisition, a pre-processing technique, a data representation scheme, and a decision 
making model [140]. Also, learning from a set of examples (training set) is an important and 
desirable feature of most pattern recognition systems [262]. The four best known approaches 
for pattern recognition are [140]: 

• Template Matching 

• Statistical Classification 

• Structural Matching 

• Neural Networks 

These approaches are not necessarily independent, and sometimes the same pattern recognition 
method exists with different interpretations [140]. For example, attempts have been made to 
design hybrid systems involving multiple approaches, such as the notion of attributed grammars 
which unifies structural and statistical pattern recognition [102]. 
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4.4.1.1 Template Matching 

Template Matching is the simplest and earliest approach to pattern recognition, and involves 
a generic operation to determine the similarity between two entities (points, curves, or shapes) 
of the same type [140]. A template (typically, a 2D shape) is available, with the pattern 
to be recognised being matched against the stored template, while considering all allowable 
changes translation, rotation and scale [140]. The similarity measure, often a correlation, may 
be optimised based on a training set, and often the template itself is defined from a training 
set [140]. 

Template Matching is computationally demanding, but the availability of ever faster processors 
has made it more feasible [140]. While effective for some application domains, it has several 
disadvantages [233]. For instance, poor performance if the patterns are distorted from the 
imaging process or a viewpoint change, or if there are large intra-class variations among the 
patterns [140]. Deformable template models [121] or rubber sheet deformations [16] can help 
to compensate when the deformation cannot be easily explained or modelled directly. 

4.4.1.2 Statistical Classification 

In Statistical Classification each pattern is represented in terms of d features or measurements, 
and is viewed as a point in a d-dimensional feature space, with the goal being to choose 
those features that allow pattern vectors belonging to different categories to occupy compact 
and disjoint regions in the <i-dimensional feature space [328]. The effectiveness of which is 
determined by how well patterns from different classes can be separated [140]. The decision 
boundaries of the <i-dimensional feature space can be determined from probability distributions, 
of the patterns belonging to each class, which must be specified or learnt [80, 86]. 

One can also take a discrimination analysis based approach to classification, in which a 
parametric form of a decision boundary is specified, and then the best decision boundary of the 
specified form is found based on the classification of training patterns [140]. Such boundaries 
can be constructed using, for example, a mean squared error criterion [328]. These direct 
boundary construction approaches are supported by the philosophy [317] that if you possess a 
restricted amount of information for solving some problem, try to solve the problem directly 
and never solve a more general problem as an intermediate step, because it is possible that the 
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available information is sufficient for a direct solution but insufficient for solving a more general 
intermediate problem. 

Statistical approaches are generally characterised by having an explicit underlying probability 
model, which provides a probability of being in each class and not just a classification, and 
hence some human intervention is assumed regarding variable selection and transformation, 
and the overall structuring of the problem [206]. 

4.4.1.3 Structural Matching 

In Structural Matching a hierarchical perspective is adopted, where a pattern is viewed as 
being composed of simple sub-patterns, which are built from yet simpler sub-patterns [246], 
and therefore it is applicable to many recognition problems involving complex patterns [140]. 
The elementary (simplest) sub-patterns to be recognised are called primitives, with the given 
complex pattern to be represented in terms of the interrelationships between these primitives 
[246]. Where the structure is syntactic, a formal analogy is drawn between the structure of 
patterns and the syntax of language. So, the primitives are viewed as the alphabet of the 
language, and the patterns are viewed as sentences generated according to the grammar of 
the language [102]. Thus, a large collection of complex patterns can be described by a small 
number of primitives and grammatical rules, which must be inferred from the available training 
samples [140]. 

Structural pattern recognition is intuitively appealing because, in addition to classification, it 
also provides a description of how the given pattern is constructed from the primitives [140]. 
It is used in situations where the patterns have a definite structure that can be captured by a 
set of rules, such as electrocardiogram waveforms [310], textured images [129], and the shape 
analysis of contours [179]. However, the implementation of structural approaches leads to many 
difficulties, including the segmentation of noisy patterns (to detect primitives) and the inference 
of grammar from training data [140]. There can also be a combinatorial explosion of possibilities 
to be investigated, demanding large training sets and significant computational effort [248]. 

4.4.1.4 Neural Networks 

Neural Networks (NNs) can be viewed as massively parallel computing systems consisting of 
an extremely large number of simple processors with many interconnections [262]. NN models 
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attempt to use certain organisational principles (such as learning, generalisation, adaptivity, 
fault tolerance, distributed representation, and computation) in a network of weighted directed 
graphs, in which the nodes are artificial neurons, and the directed edges (with weights) are 
connections between the neuron outputs and inputs [262]. The main characteristics of NNs 
are their ability to learn complex nonlinear input-output relationships, use sequential training 
procedures, and adapt themselves to the data [140]. 

The most commonly used family of NNs for pattern classification tasks is the feed-forward 
network, including multilayer perceptrons, which are organised into layers and has unidirec- 
tional connections between the layers [140]. Another popular network is the Self-Organising 
Map, or Kohonen-Network [153], which is often used for feature mapping [140]. The increasing 
popularity of NN models to solve pattern recognition problems has been primarily because of 
their low dependence on domain-specific knowledge (relative to model-based and rule-based 
approaches) and the availability of efficient learning algorithms [140]. The learning process 
involves updating the network architecture and connection weights so that a network can 
efficiently perform a specific classification [262]. 

NNs provide a suite of nonlinear algorithms for feature extraction (using hidden layers) and 
classification (e.g. multilayer perceptrons) [140]. In addition, existing feature extraction 
and classification algorithms can be mapped onto NN architectures for efficient (hardware) 
implementation [39]. Despite the seemingly different underlying principles, most of the well- 
known NN models are implicitly equivalent or similar to classical statistical pattern recognition 
methods [140]. However, NNs offer several advantages, such as unified approaches for feature 
extraction and classification, and flexible procedures for finding good, moderately nonlinear 
solutions [140]. 

4.4.1.5 Support Vector Machines 

One of the most interesting recent developments in classifier design is the introduction of the 
Support Vector Machine (SVM) [316], which is primarily a two-class classifier, and therefore 
highly suitable for the required similarity recognition component of our targeted migration. It 
uses an optimisation criterion that is the width of the margin between the classes, i.e. the empty 
area around the decision boundary defined by the distance to the nearest training patterns [41]. 
These patterns, called support vectors, define the classification function, and their number is 
minimised by maximising the margin [41]. This is achieved through a kernel function K, which 
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transposes the data into a higher-dimensional space where a hyperplane performs the separation 
[41]. In its simplest form the kernel function is just a dot product between the input pattern 
and a member of the support set, resulting in a linear classifier, while nonlinear kernel functions 
lead to a polynomial classifier [140] . 

SVMs are closely related to Neural Networks, being a close cousin to classical multilayer 
perceptrons, with the use of a sigmoid kernel function making them equivalent to two-layer 
perceptrons [1]. However, in the training of NNs, such as multi-layer perceptrons, the weights 
of the network are found by solving a non-convex unconstrained minimisation problem, while 
the use of a kernel function in SVMs solves a quadratic programming problem with linear 
constraints [276]. 

An important advantage of SVMs is that they offer the possibility to train generalisable 
nonlinear classifiers in high-dimensional spaces using a small training set [313]. Furthermore, 
for large training sets a small support set is typically selected for designing the classifier, thereby 
minimising the computational requirements during training [313]. 

4.4.1.6 Choice of Technique 

Template Matching is not suitable for the required pattern recognition of our targeted migration, 
because its effective use is domain specific [140] and the similarity recognition between the 
semantic descriptions of Agents is very different to the domains that it is typically applied 
[233] . Statistical Classification is also not suitable, because the embedded similarity recognition 
component of each Agent would require human intervention for variable selection and 
transformation [206]. Structural Matching is suitable theoretically, but implementations lead 
to many difficulties [140], including the segmentation of noisy patterns (to detect primitives) 
and the inference of grammar from training data [140]. There can also be a combinatorial 
explosion of possibilities to be investigated, demanding large training sets and significant 
computational effort [248], neither of which is available. Neural Networks are suitable, given 
their low dependence on domain-specific knowledge and the availability of efficient learning 
algorithms [140]. Support Vector Machines, albeit a recent development [316], are also suitable 
[145], being primarily a binary classifier [140] for training generalisable nonlinear classifiers in 
high-dimensional spaces using small training sets [313]. So, we will make use of both Neural 
Networks and Support Vector Machines for the theoretical completion and implementation of 
our targeted migration. 
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4.4.2 Neural Networks 



So, in the first instance, we will leverage the pattern recognition capabilities of Neural Networks 
(NNs) for the embedded similarity recognition components of the Agents, allowing them to 
determine similarity to one another based on the similarity of their semantic descriptions. 
We will use multilayer perceptrons (feed-forward artificial NNs) with backpropagation [131] to 
provide the required pattern recognition behaviour, because of their ability to solve problems 
stochastically, which allows for approximate solutions to extremely complex problems [131]. 
They are a modification of the standard linear perceptron [268], using three or more layers 
of neurons (nodes) with nonlinear activation functions to distinguish data that is not linearly 
separable, or separable by a hyperplane [35]. The power of the multilayer perceptron comes 
from its similarity to certain biological neural networks in the human brain, and because of 
their wide applicability has become the standard algorithm for any supervised-learning pattern 
recognition process [131]. 

A pre-processing [35] of Agent semantic descriptions will be required that is consistent across 
the entire Digital Ecosystem, requiring an alphabetical ordering of the attribute tuples within a 
semantic description, a standardisation of the length of the attributes, before finally making use 
of a binary encoding for processing by a NN [35] . The assumption of information structured as 
tuples, including an attribute name and attribute value, is accurate for our simulated semantic 
descriptions, but is also a reasonable assumption for any semantic description of web services 
[87, 205, 72, 42]. To standardise the length of the attributes, after removing any white-space 1 , 
an average word length of six characters will be used, because 5.39 is the average word length 
for business English [100]. For the binary encoding we propose using Unicode (UTF-8), which 
is based on extending ASCII to provide multilingual support [108]. However, ASCII's support 
of only English [108] will be sufficient for our simulations. The size (number of neurons) of the 
input layer [131] will be proportional to the semantic description of the Agent in which the 
NN is embedded, taking advantage of the variation in length of different semantic descriptions, 
which will assist the NN-based pattern recognition in determining dissimilarity. 

We will use a single hidden layer, which is usually sufficient for most tasks [35] . The size of which 
will be determined through exploratory programming [294] in our simulations, because of the 
difficulty in determining the optimal size without training several networks and estimating the 

1 A white-space is any single character or series of characters that represent horizontal or vertical space in 
typography [101]. 
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Figure 4.8: Neural Network for the Similarity Recognition Component of Agents in Targeted 
Migration: Consisting of an input layer proportional to the semantic description of the Agent in 
which it is embedded. A single hidden layer, and an output layer consisting of a single neurone 
to provide a binary response to the question of whether another Agent's semantic description 
is similar. 



generalisation error [274], evident by the range of inconsistent rules of thumb [37, 302, 34, 38] 
available to define the optimal size. The output layer [35] will consist of a single neurone to 
provide a binary (true or false) response to the question of whether another Agent's semantic 
description is similar to the Agent's own semantic description. We will use a threshold of 0.90 
on its output for the determination of similarity. The overall structure of the Neural Network 
is visualised in Figure 4.8. 



Multilayer perceptrons use nonlinear activation functions, which were developed to model the 
frequency of action potentials (firing) of biological neurons in the brain [131]. The main 
activation function used in current applications is the sigmoid function [131], a hyperbolic 
tangent that is normalised and in which the output y of a neurone is the sum of the weighted 
input values x [35], 

y= (T+1F 5 )' (4 ' 2) 

The weights x between the neurons will be randomly initialised, then trained to the real numbers 
that provide the desired functionality, because learning occurs in the perceptron by changing 
the connection (synaptic) weights after each piece of data is processed, based on the error of 
the output compared to the expected result [35]. This is an example of supervised learning and 
is carried out through backpropagation, a generalisation of the least mean squares algorithm 
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[131]. The network is therefore trained by providing it with input and corresponding output 
patterns [35]. 

The NN-based embedded similarity recognition component of an Agent will be trained when 
the Agent is deployed to a Habitat of the Digital Ecosystem. The initial training set will consist 
of the semantic description of the Agent as a positive match, and variants created from its own 
semantic description. If the variant is less than 10% different it will be processed as a positive 
match, else it will be processed as a negative match. The training set can be extended based on 
experience, making use of when an Agent visits a Habitat through targeted migration (i.e. one 
acquired from an inter- Agent interaction); if visiting the Habitat proves successful the semantic 
description of the interacting Agent can be appended to the training set as a positive match, 
else as a negative match. 



4.4.3 Support Vector Machines 



In the second instance, we will leverage the pattern recognition capabilities of Support 
Vector Machines (SVMs) for the embedded similarity recognition components of the Agents, 
allowing them to determine similarity to one another based on the similarity of their semantic 
descriptions. As SVMs are closely related to Neural Networks, being a close cousin to classical 
multilayer perceptrons [1], we will make use of the pre-processing and the training sets defined 
in the previous subsection, which will also ensure a fair comparison of the pattern recognition 
techniques in empowering the similarity recognition components of the Agents. 

The selection of a suitable kernel function is important, since it defines the feature space in 
which the training set is classified [62], operating as shown in Figure 4.9. A Radial Basis 
Functions (RBF) is recommended for text categorisation [145] , with the most common form of 
the RBF being Gaussian [123]. 

Training a SVM requires solving a large quadratic programming (QP) optimisation problem, 
which Sequential Minimal Optimisation (SMO) breaks into a series of the smallest possible 
QP problems. SMO solves these small QP problems analytically, which avoids using a 
time-consuming numerical QP optimisation. SMO scales between linear and quadratic time 
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Figure 4.9: Support Vector Machine (modified from [304]): Visualisation showing the training 
set in the Input Space, and its binary classification by a hyperplane in the higher dimensional 
Feature Space, achieved through the kernel function. A Radial Basis Functions (RBF) is 
recommended for text categorisation [145], with the most common form of the RBF being 
Gaussian [123]. 

complexity, relative to the size of the training set, because it avoids matrix computation [250]. 
The alternative, a standard Projected Conjugate Gradient (PCG) chunking algorithm scales 
between linear and cubic time complexity, relative to the size of the training set [250]. So SMO 
is faster, up to a thousand times on real- world sparse data sets [250] . 

The issue of the learnt behaviour of the embedded similarity recognition component of an 
Agent, whether SVM or NN based, being inherited when the Agent reproduces is known as 
the Baldwin effect [19]. The Baldwin effect has always been controversial within biological 
ecosystems [329], primarily because of the problem of confirming it experimentally [298]. Also, 
offspring in biological ecosystems can be genetically different to their parents [29], such that 
any learnt behaviour could potentially be inappropriate. However, the offspring in our Digital 
Ecosystem are genetically identical to their parents (in terms of the individual Agents), and 
so it makes little sense to force the loss of learnt behaviour. Therefore, we doubt that the 
Baldwin effect will adversely affect our Digital Ecosystem, which we will confirm through our 
simulations. 



Now that the targeted migration is theoretically complete, with two alternative pattern 
recognition techniques, we can confirm its effect experimentally through simulations. 
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4.5 Simulation and Results 

We simulated the Digital Ecosystem using our simulation from section 2.3 (unless otherwise 
specified), adding the classes and methods necessary to implement the proposed clustering 
catalyst and targeted migration augmentations. Each experimental scenario was run ten 
thousand times for statistical significance of the means and standard deviations calculated. 

4.5.1 Clustering Catalyst 

We implemented the clustering catalyst as defined in sections 4.2.1 and 4.3, with the clusters 
determined by hierarchical agglomerative average-link clustering or our Physical Complexity 
clustering. We also made use of our simulations from Chapter 3 to simulate evolving Agent 
Populations with between two and six clusters, varied according to a Gaussian distribution, 
with the crossover rate increased from 10% to 25% to provide a greater opportunity for the 
clustering catalyst to operate. 

4.5.1.1 Control 

The clustering catalyst benefited from the additional crossover, which alone could have been 
responsible for any observed optimisation, because it increased recombination [168] in the 
evolving Agent Populations. So, increasing variation in the exploration of solutions, potentially 
reducing the number of generations required to evolve the optimal solution. Therefore, our 
experimental simulations included a crossover control for the additional crossover, which 
excluded the clustering catalyst. 

4.5.1.2 Hierarchical Clustering 

We started with the hierarchical agglomerative average-link clustering [239] based clustering 
catalyst, as defined in section 4.3.1, making use of RapidMiner [256] to perform the required 
clustering. In Figure 4.10 we graphed for the simulation runs the average number of generations 
required to evolve the optimal solution, for the evolving Agent Populations with the hierarchical 
clustering based clustering catalyst, compared to the evolving Agent Populations with the 
crossover control, and the evolving Agent Populations alone. The evolving Agent Populations 
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Figure 4.10: Graph of the Hierarchical Clustering Based Clustering Catalyst: The evolving 
Agent Populations alone averaged 296 (3 s.f.) generations, while the evolving Agent Populations 
with the crossover control showed a 9% reduction, averaging 267 (3 s.f.) generations. The 
evolving Agent Populations with the hierarchical clustering based clustering catalyst failed to 
provide any further optimisation, averaging 281 (3 s.f.) generations. 

alone averaged 296 (3 s.f.) generations with a standard deviation of 12.76 (2 d.p.), while 
the evolving Agent Populations with the crossover control showed a 9% reduction, averaging 
267 (3 s.f.) generations with a standard deviation of 8.91 (2 d.p.). The evolving Agent 
Populations with the hierarchical clustering based clustering catalyst failed to provide any 
further optimisation, averaging 281 (3 s.f.) generations with a standard deviation of 20.25 
(2 d.p.). 

4.5.1.3 Physical Complexity Clustering 

Next we considered the Physical Complexity based clustering catalyst, as defined in section 4.3.2, 
making use of our simulations from section 3.2.3. In Figure 4.11 we graphed for the simulation 
runs the average number of generations required to evolve the optimal solution, for the 
evolving Agent Populations with the Physical Complexity based clustering catalyst, compared 
to the evolving Agent Populations with the hierarchical clustering based clustering catalyst, the 
evolving Agent Populations with the crossover control, and the evolving Agent Populations 
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Figure 4.11: Graph of the Physical Complexity Based Clustering Catalyst: The evolving Agent 
Populations with the Physical Complexity based clustering catalyst averaged 274 (3 s.f.), better 
than the evolving Agent Populations with the hierarchical clustering based clustering catalyst 
which averaged 281 (3 s.f.) generations, but still worse than the evolving Agent Populations 
with the crossover control which averaged 267 (3 s.f.) generations. 

alone. The evolving Agent Populations with the Physical Complexity based clustering catalyst 
averaged 274 (3 s.f.) generations with a standard deviation of 17.53 (2 d.p.), better than 
the evolving Agent Populations with the hierarchical clustering based clustering catalyst which 
averaged 281 (3 s.f.) generations with a standard deviation of 20.25 (2 d.p.), but still worse 
than the evolving Agent Populations with the crossover control which averaged 267 (3 s.f.) 
generations with a standard deviation of 8.91 (2 d.p.). 

Additional experiments were conducted in which we varied several different parameters to 
determine if there were any conditions under which the clustering catalyst was effective. The 
varied parameters included the population size, the mutation rate and the crossover rate. 
However, extensive testing through multiple scenarios failed to show any significant reduction 
in the number of generations required to evolve the optimal solution. So, confirming that the 
evolving Agent Populations with the clustering catalyst were less effective than the evolving 
Agent Populations with the crossover control. As the clustering catalyst was unsuccessful, in 
Figure 4.12 we graphed a typical run of each scenario to observe its behaviour and so better 
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Figure 4.12: Graph of Typical Runs for the Clustering Catalysts: As the clustering catalyst 
was unsuccessful, we graphed a typical run of each scenario to observe its behaviour and so 
better understand why it failed. However, there was no unexpected behaviour, confirming that 
the evolving Agent Populations with the clustering catalyst were simply less efficient than the 
evolving Agent Populations with the crossover control. 

understand why it failed. However, there was no unexpected behaviour, confirming that the 
evolving Agent Populations with the clustering catalyst were simply less efficient than the 
evolving Agent Populations with the crossover control. 

The results showed that the clustering catalyst, using hierarchical clustering [239] or Physical 
Complexity clustering, failed to optimise the evolutionary processes (with clusters) of the 
Digital Ecosystem. Additionally, typical runs of each showed no adverse behaviour, which 
might have explained the failure to optimise the evolutionary processes. Therefore, the results 
collectively confirm that the intra-cluster crossover assignment of the clustering catalyst was 
less efficient than the random crossover assignment of the crossover control. The clustering 
catalyst intuitively had potential, but most likely failed because the individuals within the 
evolving Agent Populations lacked sufficient complexity (relative to biological populations [29]) 
for the mechanism to be effective, leading to the crossing of very similar individuals, producing 
offspring that were very similar to their parents, and therefore not actually achieving valuable 
change. 
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4.5.2 Targeted Migration 

We implemented the targeted migration as denned in sections 4.2.4 and 4.4, using both Neural 
Network (NN) and Support Vector Machine (SVM) based similarity recognition components 
embedded within the Agents. We also made use of our Xgrid [159] modifications from Chapter 
3 to take advantage of the grids mentioned in the acknowledgements. 

4.5.2.1 Controls 

The targeted migration was dependent on additional Agent migration, which alone could have 
been responsible for any observed optimisation, because it led to greater distribution of the 
Agents within the Digital Ecosystem, potentially improving responsiveness for the user base. 
So, we included a migration control in our experimental simulations for the additional Agent 
migration, being random instead of targeted. Furthermore, to determine the contribution of 
the NNs and SVMs on the targeted migration we created a pattern recognition control, using a 
rudimentary distance function adapted from the fitness function defined in section 2.3.3. 

In Figure 4.13 we graphed for the simulation runs the average of the percentage response rate 
after a thousand time steps (user request events), for the Digital Ecosystem with the migration 
control, and the Digital Ecosystem with the pattern recognition control, compared to the Digital 
Ecosystem alone. The Digital Ecosystem alone averaged a 68.0% (3 s.f.) response rate with 
a standard deviation of 2.61 (2 d.p.), while the Digital Ecosystem with the migration control 
showed a significant degradation to 49.6% (3 s.f.) with a standard deviation of 1.96 (2 d.p.), and 
the Digital Ecosystem with the pattern recognition control showed only a small increase to 70.5% 
(3 s.f.) with a standard deviation of 2.60 (2 d.p.). Therefore, any observed improvement from 
the targeted migration was not from the additional migration but its targeting, and that the 
effectiveness of the pattern recognition functionality will be significant if the targeted migration 
is to be effective. 

In Figure 4.14 we graphed a typical run of the Digital Ecosystem with the migration control, and 
the Digital Ecosystem with the pattern recognition control, compared to the Digital Ecosystem 
alone (taken from Figure 2.25). The Digital Ecosystem alone performed as expected, adapting 
and improving over time to reach a mature state through the process of ecological succession [29]. 
The Digital Ecosystem with the migration control, which included additional random migration, 
while initially beneficial, ultimately decreased the responsiveness of the Digital Ecosystem. 
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Figure 4.13: Graph of the Targeted Migration Controls and the Digital Ecosystem: The Digital 
Ecosystem alone averaged a 68.0% (3 s.f.) response rate, while the Digital Ecosystem with the 
migration control showed a significant degradation to 49.6% (3 s.f.), and the Digital Ecosystem 
with the pattern recognition control showed only a small increase to 70.5% (3 s.f.). 
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Figure 4.14: Graph of Typical Runs for the Targeted Migration Controls and the Digital 
Ecosystem: The Digital Ecosystem alone performed as expected, adapting and improving over 
time to reach a mature state. The migration control with additional random migration ultimately 
decreased the responsiveness, while the pattern recognition control performed only slightly better. 
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Finally, the Digital Ecosystem with the pattern recognition control performed only marginally 
better than the Digital Ecosystem alone. 



4.5.2.2 Neural Networks 

We started with the NN-based targeted migration, as defined in section 4.4.2. We made use 
of Joone (Java Object Oriented Neural Engine) [194] to implement the required NNs, and 
exploratory programming [294] to determine that a hidden layer 1.5 times the size of the input 
layer was effective for the NN-based similarity recognition components. 

In Figure 4.15 we graphed for the simulation runs the average of the percentage response rate 
after a thousand time steps (user request events), for the Digital Ecosystem with the NN-based 
targeted migration, compared to the Digital Ecosystem alone. The Digital Ecosystem alone 
averaged a 68.0% (3 s.f.) response rate with a standard deviation of 2.61 (2 d.p.), while the 
Digital Ecosystem with the NN-based targeted migration showed a significant improvement to 
a 92.1% (3 s.f.) response rate with a standard deviation of 2.22 (2 d.p.). 
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Figure 4.15: Graph of Neural Networks Based Targeted Migration: The Digital Ecosystem 
alone averaged a 68.0% (3 s.f.) response rate with a standard deviation of 2.61 (2 d.p.), while 
the Digital Ecosystem with the NN-based targeted migration showed a significant improvement 
to a 92.1% (3 s.f.) response rate with a standard deviation of 2.22 (2 d.p.). 
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4.5.2.3 Support Vector Machines 



Next we considered the SVM-based targeted migration, as defined in section 4.4.3, making 
use of LIBSVM (Library for Support Vector Machines) [50] to implement the required SVMs. 
In Figure 4.16 we graphed for the simulation runs the average of the percentage response 
rate after a thousand time steps (user request events), for the Digital Ecosystem with the 
SVM-based targeted migration, compared to the Digital Ecosystem with the NN-based targeted 
migration, and the Digital Ecosystem alone. The Digital Ecosystem with the SVM-based 
targeted migration averaged a 92.8% (3 s.f.) response rate with a standard deviation of 2.09 (2 
d.p.), slightly better than the NN-based targeted migration at 92.1% (3 s.f.) with a standard 
deviation of 2.22 (2 d.p.), and so significantly better than the Digital Ecosystem alone at 68.0% 
(3 s.f.) with a standard deviation of 2.61 (2 d.p.). 



In Figure 4.17 we graphed typical runs of the Digital Ecosystem with the SVM-based targeted 
migration, the Digital Ecosystem with the NN-based targeted migration, and the Digital 
Ecosystem alone. The Digital Ecosystem alone performed as expected, adapting and improving 
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Figure 4.16: Graph of Support Vector Machine Based Targeted Migration: The Digital 
Ecosystem with the SVM-based targeted migration averaged a 92.8% (3 s.f.) response rate, 
slightly better than the NN-based targeted migration at 92.1% (3 s.f.), and so significantly better 
than the Digital Ecosystem alone at 68.0% (3 s.f.). 
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Figure 4.17: Graph of Typical Runs for the Digital Ecosystem and Targeted Migration: The 
Digital Ecosystem alone performed as expected, adapting and improving over time to reach 
a mature state through the process of ecological succession [29] In comparison, the Digital 
Ecosystem with the targeted migration, NN or SVM-based, showed a significant improvement. 

over time to reach a mature state through the process of ecological succession [29] , approaching 
70% effectiveness for the user base. The Digital Ecosystem with the targeted migration, NN or 
SVM-based, showed a significant improvement in the ecological succession, reaching the same 
performance in less than a fifth of the time, before reaching over 90% effectiveness for the 
user base. To show more clearly the greater effectiveness of the SVM-based targeted migration, 
compared to the NN-based targeted migration, we graphed in Figure 4.18 the frequency of poor 
matches (<50%) every one hundred time steps, for the Digital Ecosystem with the SVM-based 
targeted migration, compared to the Digital Ecosystem with the NN-based targeted migration, 
and the Digital Ecosystem alone. 

The results showed that the targeted migration optimised and accelerated the ecological 
succession [29] of our Digital Ecosystem, constructively interacting with its ecological and 
evolutionary dynamics. The results also showed that it was not the additional migration, but its 
targeting that created the improvement in the Digital Ecosystem, and that an effective pattern 
recognition technique was required for the targeted migration to operate effectively. Both NNs 
and SVMs proved to be effective, SVMs marginally more than NNs. The results also showed 
that there were no adverse side-effects from the Baldwin effect [19], the inheritance of learnt 
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Figure 4.18: Graph of Frequencies for the Targeted Migration: The frequency of poor matches 
(<50%) every one hundred time steps, for the Digital Ecosystem with the SVM-based targeted 
migration, compared to the Digital Ecosystem with the NN-based targeted migration, and the 
Digital Ecosystem alone. It shows the greater effectiveness of the SVM-based targeted migration, 
compared to the NN-based targeted migration from the seven hundredth generation onwards. 



behaviour in the Agents from the embedded similarity recognition components, whether SVM 
or NN based. Finally, based on the experimental results, and our theoretical understanding, 
we would recommend SVMs for the pattern recognition functionality of the targeted migration. 



4.6 Summary and Discussion 



We started by reviewing the scope for optimisation and acceleration resulting from the 
evolutionary self-organisation of ecological succession (the formation of a mature ecosystem) 
being a slow process [29], even the accelerated form present in our Digital Ecosystem, 
which was identified and confirmed in the previous chapters. In the results of Chapter 2, 
specifically the ecological succession experiment, the Digital Ecosystem reached only 70% 
responsiveness, identifying potential for improvement, which was confirmed by the results 
of Chapter 3, specifically the self- organised diversity experiment for which the Digital 
Ecosystem responded more slowly in some scenarios than others, confirming the potential 
for improvement. Furthermore, the optimisation of Digital Ecosystems sought was not that 
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of parameter optimisation, which is achievable through exploratory programming [294], but 
an augmentation to the Ecosystem-Oriented Architecture providing a significant improvement 
in performance, i.e. better solutions for the users than the Digital Ecosystem alone could 
achieve. We then discovered that we would be unlikely to find optimising or accelerating 
augmentations for our Digital Ecosystem from biological ecosystems research, because ecological 
optimisation is concerned with the maintenance of diversity and stability [323, 3, 224], and 
ecological acceleration is similarly concerned with the re-establishment of diversity and stability 
[201, 181, 336]. This was unsurprising, because one of the fundamental differences between 
biological and digital ecosystems lie in the motivation and approach of their researchers; given 
that biological ecosystems are ubiquitous natural phenomena whose maintenance is crucial to 
our survival [20], whereas Digital Ecosystems are a technology engineered to serve specific 
human purposes. 

So, we therefore proposed, constructed, and explored several alternative augmentations to 
accelerate or optimise the evolutionary and ecological self-organising dynamics of our Digital 
Ecosystem, based on the understanding and results from the previous chapters, and our general 
knowledge and intuition. The clustering catalyst aimed to optimise the evolutionary self- 
organisation of evolving Agent Populations with clusters, by encouraging intra-cluster crossover 
to directly accelerate the evolving of applications (Agent-sequences) in response to user requests, 
and therefore the responsiveness of the Digital Ecosystem to the user base. The replacement 
aggregator would have replaced the use of evolutionary computing [90], for the aggregation 
of the Agents into optimal applications (Agent-sequences) in response to user requests, with 
an alternative technique, directly optimising the responsiveness of the Digital Ecosystem to 
the user base. The Agent-pool aggregation aimed to allow for the creation of potentially 
useful applications (Agent-sequences) or partial applications inside the Agent-pools, optimising 
the Agent-sequences found at the Agent-pools of the Habitats, which would in turn optimise 
the evolving Agent Populations, and therefore the responsiveness of the Digital Ecosystem to 
the user base. The targeted migration aimed to complement the evolving Agent Populations 
indirectly, with additional highly targeted migration to support the existing Agent migration 
between the Habitats, optimising the Agents found at the Agent-pools of the Habitats, which 
would in turn optimise the evolving Agent Populations, and therefore the responsiveness of 
the Digital Ecosystem to the user base. The replacement aggregator would have weakened the 
Ecosystem- Oriented Architecture of Digital Ecosystems, and the Agent-Pool aggregation would 
have incurred an impractical computational cost to operate, so we chose the most promising 
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augmentations, the clustering catalyst and the targeted migration, to be completed theoretically 
and then investigated experimentally through simulations. 

The clustering catalyst augmentation aimed to directly optimise the evolutionary self- 
organisation of evolving Agent Populations with clusters, by encouraging intra-cluster crossover. 
Crossover involves the crossing of two Agent-sequences in the creation of new Agent-sequences, 
occurring during the replication stage of the evolutionary cycle [13]. Theoretical completion 
of the clustering catalyst required an algorithm to perform the clustering of evolving Agent 
Populations. So, we considered alternative clustering algorithms [142, 139, 239, 324, 180, 
339, 280] for their suitability to our evolving Agent Populations. We chose a hierarchical 
agglomerative average-link clustering algorithm [239] for our clustering catalyst, because a 
hierarchical algorithm was more appropriate [142] than a partitional one for the small size 
of the data sets of evolving Agent Populations, and because an agglomerative algorithm 
was conceptually simpler than a divisive one [190], for which the efficiency advantage [190] 
would have been negligible given the expected size of the data sets. Also, because the 
average-link algorithm is designed to reduce the dependence of the cluster-linkage criterion 
on extreme values, such as the most similar or dissimilar of the single-link and complete- 
link algorithms [239]; and because the minimum- variance algorithm is biased to producing 
clusters with the same number of objects [211], which would have been problematic for clusters 
emerging over the generations. We also considered a clustering algorithm based on our extended 
Physical Complexity from Chapter 3, because clustering is subjective by nature [142], and our 
extended Physical Complexity was augmented in section 3.2.2.3 to understand the clustering of 
evolving Agent Populations. So, we implemented the clustering catalyst using the hierarchical 
agglomerative average-link clustering [239] and our Physical Complexity clustering. The results 
showed that the clustering catalyst, using either clustering algorithm, failed to optimise the 
evolutionary processes of the Digital Ecosystem. It intuitively had potential, but most likely 
failed because the individuals within the evolving Agent Populations lacked sufficient complexity 
(relative to biological populations [29]) for the mechanism to be effective, leading to the crossing 
of very similar individuals, producing offspring that were very similar to their parents, and 
therefore not actually achieving valuable change. 

The targeted migration augmentation aimed to directly optimise the ecological migration, 
and therefore indirectly complement the evolutionary self-organisation of the evolving Agent 
Populations, through the highly targeted migration of the Agents to their niche Habitats. 
The migration probabilities between the Habitats produces the passive Agent migration, 
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allowing the Agents to spread in the correct general direction within the Habitat network, 
based primarily upon success at their current location. The targeted migration works in 
a more active manner, allowing the Agents highly targeted migration to specific Habitats, 
based upon their interaction with one another to discover Habitats where they could be 
valuable (i.e. find a niche). Theoretical completion of the targeted migration required further 
consideration of how it would operate, including its effect on the Agent life-cycle, and a 
suitable pattern recognition [140] technique for the required similarity recognition between 
the semantic descriptions of the Agents. So, we considered alternative pattern recognition 
techniques [140, 262, 102, 233, 328, 246, 141, 123] for similarity recognition components to 
be embedded within the Agents. Template Matching was not suitable because its effective 
use is domain specific [140] and the similarity recognition between the semantic descriptions of 
Agents is very different to the domains that it is typically applied [233] . Statistical Classification 
was also not suitable, because the embedded similarity recognition component of each Agent 
would have required human intervention for variable selection and transformation [206] . While 
Structural Matching was a suitable technique theoretically, implementations have had many 
difficulties [140], including the segmentation of noisy patterns (to detect primitives) and the 
inference of grammar from training data [140]. There can also be a combinatorial explosion 
of possibilities to be investigated, demanding large training sets and significant computational 
effort [248], neither of which was available. Neural Networks (NNs) were suitable, given their low 
dependence on domain-specific knowledge and the availability of efficient learning algorithms 
[140]. Support Vector Machines (SVMs), albeit a recent development [316], were also suitable 
[145], being primarily a binary classifier [140] for training generalisable nonlinear classifiers 
in high-dimensional spaces using small training sets [313]. So, we implemented the targeted 
migration using both NN and SVM based similarity recognition components embedded within 
the Agents. The results showed that the targeted migration accelerated and optimised the 
ecological succession of the Digital Ecosystem, constructively interacting with its ecological 
and evolutionary dynamics, marginally more when powered by SVMs than NNs. So, based on 
the experimental results, and our theoretical understanding, we would recommend SVMs for 
the pattern recognition functionality of the targeted migration. 

The targeted migration also resulted in the Baldwin effect [19], the inheritance of learnt 
behaviour in the Agents, from the embedded similarity recognition components, whether 
SVM or NN based. While the Baldwin effect has always been controversial within biological 
ecosystems [329], primarily because of the problem of confirming it experimentally [298], 
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it undoubtedly occurred in our Digital Ecosystem. However, the experimental results and 
our exploratory programming [294] showed no adverse side-effects, therefore supporting the 
possibility of its presence in biological ecosystems. 

In this chapter we attempted the acceleration and optimisation of Digital Ecosystems, because 
ecological succession (the formation of mature ecosystems) is a slow process [29], even the 
accelerated form present in our Digital Ecosystems. While not all our attempts were 
successful, understandable considering the constructive nature of our efforts in this chapter, 
we have optimised and accelerated our Digital Ecosystem through the targeted migration of its 
Agents. The targeted migration significantly enhanced the ecological succession, constructively 
interacting with the ecological and evolutionary dynamics, helping the Agents to optimise their 
migration and distribution within our Digital Ecosystem. 



Chapter 5 



Conclusions 



5.1 Achievements 



Substantial parts of our efforts are original contributions in the area of Biologically-Inspired 
Computing [99] and the emerging field of Digital Ecosystems, with our major research 
contributions being as follows: 

• We have created the first interpretation of Digital Ecosystems where the word ecosystem 
is more than just a metaphor, which we have confirmed experimentally. They are the 
digital counterparts of biological ecosystems: having their properties of self-organisation, 
scalability and sustainability [173]; created through combining understanding from the- 
oretical ecology [175], evolutionary theory [104], Multi- Agent Systems [249], distributed 
evolutionary computing [178], and Service-Oriented Architectures [228]. Furthermore, 
the Ecosystem-Oriented Architecture of Digital Ecosystems includes a novel form of 
distributed evolutionary computing, an optimisation technique working at two levels: 
a first optimisation, migration of Agents which are distributed in a peer-to-peer network, 
operating continuously in time; this process feeds a second optimisation, based on 
evolutionary computing, operating locally on single peers and is aimed at finding solutions 
that satisfy locally relevant constraints. So, the local search is improved through this 
twofold process to yield better local optima faster, as the distributed optimisation provides 
prior sampling of the search space through computations already performed in other peers 
with similar constraints. We have also defined the interaction of Digital Ecosystems with 
business ecosystems [214], specifically in supporting and enabling them to create Digital 
Business Ecosystems. 
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• We have investigated the emergent self-organising properties of Digital Ecosystems, 
because a primary motivation for our research is the desire to exploit the self- 
organising properties [173] of biological ecosystems. We started with the evolutionary 
self-organisation of ecological succession [29], which conformed to expectations [136]. 
Next we considered the self-organisation of the order constructing processes (the 
evolving Agent Populations). We extended Physical Complexity [8] to include evolving 
Agent Populations, which required extending definitions for populations of variable 
length sequences, creating a measure for the efficiency of information storage, and an 
understanding of clustering within Populations to support the non-atomicity of Agents. 
We then extended Chli-DeWilde stability [53] to include the evolutionary dynamics of 
evolving Agent Populations, building upon this to construct an entropy-based definition 
for the degree of instability, which was used to study the stability of evolving Agent 
Populations. Finally, the unique hybrid nature of Digital Ecosystems resulted in us 
creating our own definition for the self-organised diversity, based on the global distribution 
of the Agents in the Populations relative to the request behaviour of the user base. 
Overall an insight has been achieved into where and how self-organisation occurs in Digital 
Ecosystems, including what forms it can take and how it can be quantified. 

• We have optimised and accelerated Digital Ecosystems, because the evolutionary self- 
organisation of ecological succession [29] (the formation of a mature ecosystem) is a slow 
process, even the accelerated form present in Digital Ecosystems. So, we considered 
alternative augmentations, including the accelerating effect of a clustering catalyst on 
the evolutionary dynamics, through the acceleration of the evolutionary processes; and 
the optimising effect of targeted migration on the ecological dynamics, through the 
emergent optimisation of the Agent migration patterns. The experimental results 
showed that the clustering catalyst failed, despite intuitively having potential, most 
likely because the individuals within the evolving Agent Populations lacked sufficient 
complexity (relative to biological populations [29]) for the augmentation to be effective. 
However, the experimental results also showed that the targeted migration optimised 
and accelerated the ecological succession of Digital Ecosystems, constructively interacting 
with the ecological and evolutionary dynamics. We also discovered that there were no 
adverse side-effects from the Baldwin effect [19], the inheritance of learnt behaviour, 
in Digital Ecosystems. Therefore, supporting the possibility of the Baldwin effect in 
biological ecosystems, which has always been controversial [329]. 
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5.2 Future Directions 

Our efforts offer considerable scope for the future, with there being several interesting avenues 
to pursue, some of which we discuss below. 

5.2.1 Ecosystems Conceptualisation 

Conceptualising ecosystems has been an inherent part of this work, which presents us with 
an opportunity to formalise our current and future efforts to improve the cross-disciplinary 
knowledge transfer required. 

5.2.1.1 Biology of Digital Ecosystems 

In creating Digital Ecosystems, the digital counterpart of biological ecosystems, we naturally 
asked their likeness to the biological ecosystems from which they came. Further to this, we could 
consider the applicability of other aspects of ecosystems theory in understanding and analysing 
the dynamics of Digital Ecosystems. For example, energy pyramids 1 of biological ecosystems, 
what is their equivalent in Digital Ecosystems? Given that Digital Ecosystems are information- 
centric, whereas biological ecosystems are energy-centric [29], they would undoubtedly be 
information pyramids, but further definition would naturally require more research. 

5.2.1.2 Biological Design Patterns 

A design pattern is a general reusable solution to a commonly occurring problem in software 
design [106]. It is not a finished design that can be transformed directly into code, but 
a description or template for how to solve a problem that can be used in many different 
situations [106]. For example, object-oriented design patterns typically show relationships 
and interactions between classes or objects, without specifying the final application classes or 
objects that are involved [106]. Biological Design Patterns (BDPs) would extend this concept to 
catalogue common interactions between biological structures using a pattern-oriented modelling 
approach [122], which when applied would endow software systems with the desirable properties 
of biological systems, such as self-organisation, self-management, scalability and sustainability. 

1 Energy pyramids show the dissipation of energy at trophic levels, positions that organisms occupy in a food 
chain, e.g. producers or consumers [237]. 
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5.2.1.3 Classes of Ecosystems 

While evolutionary theory [104] was well understood within computer science, under the 
auspices of evolutionary computing [90], ecosystems theory [29], until our efforts, was not. 
Similarly, while evolutionary theory is well understood within linguistics [63] and economics 
[227], equally ecosystems theory is not [218]. So, using our efforts as a case study, we could 
follow the same process to create Language Ecosystems and Economic Ecosystems. For 
example, there are many separate efforts within linguistics using evolution to model language 
change [54], but there is no unifying framework, which has resulted from different linguists 
independently adopting elements of evolutionary theory [54]. So, we could provide a wide- 
ranging and encompassing definition of Language Ecosystems, which would unify the many 
disparate efforts in linguistics aimed at understanding language evolution. 

5.2.1.4 Generic Ecosystem Definition 

In the creation of Digital Ecosystems we considered aspects of biological ecosystems, including 
Agent-Based Modelling [116] and Complex Adaptive Systems (CAS) [173], and then constructed 
their counterparts in Digital Ecosystems. After which we considered the possibility of a 
Generic Ecosystem definition, as we suggested at the end of Chapter 2, without which some 
of the counterparts we constructed appeared to be compromised, when they were actually 
the realisation of generic abstract concepts in Digital Ecosystems. Most notably the network 
structure, which is energy-centric in biological ecosystems [29], while information-centric in 
Digital Ecosystems. So, there is potential to create a Generic Ecosystem definition, using a 
suitable modelling technique such as CAS [322], which would abstractly define the key properties 
of an ecosystem, and would theoretically be applicable to any domain where the modelling 
technique has been applied. Therefore, the Generic Ecosystem definition would provide a 
framework for the application of ideas, concepts, and models from biological ecosystems to 
other classes of ecosystems, including Digital Ecosystems, Language Ecosystems and Economic 
Ecosystems. 

5.2.2 Simulation Framework 

An open-source simulation framework for Digital Ecosystems [161] was created by the Digital 
Business Ecosystem (DBE) project [92] , and is currently supported by the Open Philosophies for 
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Associative Autopoietic Digital Ecosystems (OPAALS) project [258] to assist further research 
into Digital Ecosystems, including the wider implications of interacting with social systems, 
such as business ecosystems of Small and Medium sized Enterprises (SMEs). 

5.2.3 Digital Business Ecosystems 

In an old market-based economy, made up of sellers and buyers, the parties exchange property 
[78]. While in a new network-based economy, made up of servers and clients in a business 
ecosystem [214], the parties share access to services and experiences [78]. Digital Ecosystems 
are a platform for the network-based economy of business ecosystems, providing mechanisms 
for the creation of Digital Business Ecosystems. 

5.2.3.1 Service Futures Market 

One such mechanism the Digital Ecosystem could provide to the network-based economy of 
business ecosystems [214], would be a futures market 2 for services. As each service (Agent) 
consists of an executable component and a semantic description, the later acting as a guarantee 
of behaviour, and the evolving Agent Populations only requiring the guarantees (semantic 
descriptions) to operate, the actual executable component of a service (Agent) is only required 
once an application (Agent-sequence) has been assembled. Therefore, service (Agent) evolution 
could operate entirely on the semantic descriptions, with business users only needing to supply 
the executable component of a service (Agent) once there is a demand, i.e. when the semantic 
description of one of their services has been used in the construction of an application which 
meets the request of another business user. Therefore, creating a futures market for evolving 
services within Digital Business Ecosystems. 

5.2.3.2 Regional Deployment 

A partial reference implementation [199] for our Digital Ecosystem, which includes an 
implementation of the targeted migration, was created by the Digital Business Ecosystems 
project [92], and we expect that once completed will be deployed as part of the software 
platform intended for the regional deployment of their Digital Ecosystems [264, 258]. Digital 

2 An auction market in which participants buy and sell commodities for an agreed price, that the sellers have 
yet to produce [137]. 
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Ecosystems (distributed adaptive open socio-technical systems, with properties of self- 
organisation, scalability and sustainability, inspired by natural ecosystems [258]) are emerging 
as a novel approach to the catalysis of sustainable regional development driven by Small 
and Medium sized Enterprises (SMEs) [258]. The community focused on the deployment of 
Digital Ecosystems, REgions for Digital Ecosystems Network (REDEN) [265], is supported 
by projects such as the Digital Ecosystems Network of regions for (4) DissEmination and 
Knowledge Deployment (DEN4DEK) [264] , a thematic network that aims to share experiences 
and disseminate all the necessary knowledge that will allow regions to plan an effective 
deployment of Digital Ecosystems at all levels (economic, social, technical and political) to 
produce real impacts in the economic activities of European regions through the improvement 
of SME business environments. So, the next major step in our research will be to collect real 
world data, confirming that Digital Ecosystems operate effectively with business ecosystems in 
creating Digital Business Ecosystems. 



5.3 Concluding Remarks 



The ever-increasing challenge of software complexity in creating progressively more sophisti- 
cated and distributed applications, makes the design and maintenance of efficient and flexible 
systems a growing challenge [209, 299, 193], for which current software development techniques 
have hit a complexity wall [184]. In response we have created Digital Ecosystems, the digital 
counterparts of biological ecosystems, possessing their properties of self-organisation, scalability 
and sustainability [173]; Ecosystem-Oriented Architectures that overcome the challenge by 
automating the search for new algorithms in a scalable architecture, through the evolution of 
software services in a distributed network. 
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